Senior Site Reliability Engineer - AWS

New

USFull-TimeSenior

Salary175,000 - 190,000 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 8+ years of experience in software engineering, infrastructure, or operations, including at least 4+ years in Site Reliability Engineering roles.
Required Skills: AWSPythonBashCI/CD

8+ years of experience in software engineering, infrastructure, or operations, including at least 4+ years in Site Reliability Engineering roles.
Strong hands-on expertise with AWS services such as EC2, EKS, Lambda, S3, IAM, and CloudWatch.
Proficiency in scripting and programming languages such as Python, Bash, or PowerShell.
Proven experience building and maintaining highly automated, large-scale production systems.
Strong knowledge of CI/CD pipelines, monitoring/alerting systems, incident response, and capacity planning.
Experience improving system reliability through automation and reducing operational toil in production environments.
Strong understanding of security best practices in cloud infrastructure.
Ability to work independently in fast-paced environments while driving continuous improvement initiatives.
Strong communication skills with the ability to collaborate across technical and non-technical stakeholders.
Bachelor’s degree in Computer Science or related field, or equivalent hands-on experience and certifications.

Design, build, and maintain highly automated and autonomous systems for deployment, testing, monitoring, and operation of production environments.
Lead reliability engineering efforts across the SDLC, ensuring system stability, performance, and scalability standards are consistently met.
Develop and enhance CI/CD pipelines, automation scripts, and operational tooling to reduce manual effort and improve delivery speed.
Implement robust monitoring, alerting, and observability systems to ensure real-time visibility into infrastructure and application health.
Identify and resolve issues related to system availability, performance bottlenecks, and security vulnerabilities.
Collaborate with engineering teams to improve architecture, reliability practices, and incident response processes.
Participate in on-call rotations and provide rapid response support for production incidents.
Document system architecture, operational procedures, and best practices while mentoring junior engineers.

View Full Description & ApplyYou'll be redirected to the employer's site