Senior System Reliability Engineer

LirioTechnology/Software, Healthcare

USFull-TimeSenior

Salary130000 - 150000 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 5-7 years related experience
Required Skills: AWSPythonSQLGitJavaKafkaKubernetesTypeScriptAzureGroovyLinuxTerraformDatadogCloudFormation

Architect, implement, and maintain automated solutions for deployment, monitoring, alerting and incident response using Lirio’s technology stack (AWS, Azure, Kubernetes, Kafka, Java, TypeScript, Groovy, Databases/SQL).
Develop and manage infrastructure as code (e.g., Terraform, AWS CloudFormation).
Build and optimize CI/CD pipelines for seamless, reliable delivery.
Define, implement, and continuously refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets for critical services.
Identify and reduce operational toil through automation, platform improvements, and architectural changes.
Ensure high availability and scalability of services through proactive engineering, load testing, and capacity planning across multi-tenant and client-specific environments.
Review infrastructure changes, automation scripts, and reliability-impacting code changes to ensure production readiness.
Collaborate with software engineers to embed reliability, security, and operational best practices into development workflows.
Participate in a defined on-call rotation supporting production systems, with clear escalation paths and expectations.
Lead incident response, root cause analysis, and postmortems for production issues.
Mentor and coach engineers on reliability engineering principles, operational ownership, and incident response best practices.
Stay current with industry trends in reliability engineering, cloud operations, and automation.

View Full Description & ApplyYou'll be redirected to the employer's site