Senior Site Reliability Engineer
New
US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom., Ability to work across multiple time zones.Full-TimeSenior
Salary113,082 - 175,725 USD per year
Apply NowOpens the employer's application page
Job Details
- Languages
- Strong English language skills (verbal and written).
- Experience
- 6+ years experience in an SRE/Operations/DevOps role as part of a team
- Required Skills
- PythonKubernetesLinuxDevOpsDistributed Systems
Requirements
- 6+ years of experience in an SRE, Operations, or DevOps role.
- Proficiency with shell and a scripting language (Python, Go, Bash, or Ruby).
- Experience with configuration management tools such as Puppet or Ansible.
- Experience with distributed caching systems and performance optimization.
- Experience with package management on Linux systems (specifically Debian).
- Strong Linux system-level troubleshooting skills.
- Proven track record of automating tasks, identifying process gaps, and implementing improvements.
- Strong English language skills (verbal and written).
- Ability to work independently in a globally distributed team across multiple time zones.
- Experience leading incident response and post-incident review rituals for root cause analysis.
- Willingness to travel 1-2 times per year for in-person events and team meetings.
Responsibilities
- Perform day-to-day operational/DevOps tasks on public-facing infrastructure, including deployment, maintenance, configuration, and troubleshooting.
- Implement and utilize configuration management and deployment tools such as Puppet and Kubernetes.
- Lead continuous improvement initiatives by automating the installation, configuration, and maintenance of services.
- Assist product teams with architectural design to ensure new services operate at scale.
- Participate in a 24/7 on-call rotation for incident response, diagnosis, and follow-up on system outages.
- Collaborate with a global, cross-functional team in an asynchronous environment.
- Mentor peers in technical and operational areas.
View Full Description & ApplyYou'll be redirected to the employer's site