Senior Site Reliability Engineer

United StatesFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Required Skills: AWSDockerPythonKubernetesGrafanaPrometheusCI/CDTerraformDatadog

Requirements

Monitoring and observability best practice including using tools like Datadog, Prometheus, Grafana
Expertise in setting up and managing alerts, dashboards, and logging
Understanding of networking concepts, security best practices, and performance optimization in AWS
Proficiency in AWS services: EKS, EC2, ECS, S3, RDS, VPC, IAM, Route 53, etc.
Experience with containerization and orchestration tools like Docker and Kubernetes
Strong knowledge of Infrastructure as Code (IaC) tools such as Terraform, CDK or CloudFormation
Knowledge of scripting and automation using languages like Python, Bash, or PowerShell
Experience with CI/CD pipelines for deploying and testing applications in AWS

Responsibilities

Implementing best practices for monitoring, alerting, and incident response using DataDog and other tools.
Designing, building, and maintaining cost-effective, reliable, and scalable AWS infrastructure.
Collaborating with cross-functional teams to identify and address performance bottlenecks and reliability issues.
Conducting post-incident reviews to analyse root causes and implement preventive measures.
Automating routine tasks and processes to improve efficiency and reduce manual intervention.
Participating in an on-call rotation to respond to system outages and emergencies.

View Full Description & ApplyYou'll be redirected to the employer's site

Similar Jobs

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming (*US Territory or Federal District) Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

116,633 - 181,243 USD per year

View Job

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: [list of states] Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya, Mexico, Morocco, Netherlands, Poland, Singapore, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

113,082 - 175,725 USD per year

View Job

Senior Site Reliability Engineer, Infrastructure

Vultr

Remote - United StatesFull-Time

125,000 - 135,000 USD per year

View Job