Senior Site Reliability Engineer - AWS

New
USFull-TimeSenior
Salary175,000 - 190,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
8+ years of experience in software engineering, infrastructure, or operations, including at least 4+ years in Site Reliability Engineering roles.
Required Skills
AWSPythonBashCI/CD

Requirements

  • 8+ years of experience in software engineering, infrastructure, or operations, including at least 4+ years in Site Reliability Engineering roles.
  • Strong hands-on expertise with AWS services such as EC2, EKS, Lambda, S3, IAM, and CloudWatch.
  • Proficiency in scripting and programming languages such as Python, Bash, or PowerShell.
  • Proven experience building and maintaining highly automated, large-scale production systems.
  • Strong knowledge of CI/CD pipelines, monitoring/alerting systems, incident response, and capacity planning.
  • Experience improving system reliability through automation and reducing operational toil in production environments.
  • Strong understanding of security best practices in cloud infrastructure.
  • Ability to work independently in fast-paced environments while driving continuous improvement initiatives.
  • Strong communication skills with the ability to collaborate across technical and non-technical stakeholders.
  • Bachelor’s degree in Computer Science or related field, or equivalent hands-on experience and certifications.

Responsibilities

  • Design, build, and maintain highly automated and autonomous systems for deployment, testing, monitoring, and operation of production environments.
  • Lead reliability engineering efforts across the SDLC, ensuring system stability, performance, and scalability standards are consistently met.
  • Develop and enhance CI/CD pipelines, automation scripts, and operational tooling to reduce manual effort and improve delivery speed.
  • Implement robust monitoring, alerting, and observability systems to ensure real-time visibility into infrastructure and application health.
  • Identify and resolve issues related to system availability, performance bottlenecks, and security vulnerabilities.
  • Collaborate with engineering teams to improve architecture, reliability practices, and incident response processes.
  • Participate in on-call rotations and provide rapid response support for production incidents.
  • Document system architecture, operational procedures, and best practices while mentoring junior engineers.
View Full Description & ApplyYou'll be redirected to the employer's site
175,000 - 190,000 USD per year
Apply Now