Senior System Reliability Engineer

New
USFull-TimeSenior
Salary130000 - 150000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
5-7 years related experience
Required Skills
AWSPythonSQLGitJavaKafkaKubernetesTypeScriptAzureGrafanaGroovyPrometheusLinuxTerraformDatadogCloudFormation

Requirements

  • 5-7 years related experience
  • Bachelor's Degree in related field
  • Linux systems and networking fundamentals (DNS, TCP/IP, TLS)
  • Distributed systems debugging and failure analysis
  • Load, stress, and fault-injection testing
  • CI/CD tools and processes
  • Version control (e.g., Git)
  • Cloud platforms (e.g., AWS, Azure)
  • Containers and orchestration (Kubernetes)
  • Kafka (messaging/streaming)
  • Scripting and programming languages (e.g., Java, TypeScript, Groovy, Python)
  • Agile methodologies (e.g., Scrum, XP, SAFe)
  • Databases/SQL
  • Observability/monitoring tools (DataDog)

Responsibilities

  • Architect, implement, and maintain automated solutions for deployment, monitoring, alerting and incident response using Lirio’s technology stack (AWS, Azure, Kubernetes, Kafka, Java, TypeScript, Groovy, Databases/SQL).
  • Develop and manage infrastructure as code (e.g., Terraform, AWS CloudFormation).
  • Build and optimize CI/CD pipelines for seamless, reliable delivery.
  • Define, implement, and continuously refine service-level indicators (SLIs), service-level objectives (SLOs), and error budgets for critical services.
  • Monitor system health using modern observability tools (e.g., Prometheus, Grafana, Datadog).
  • Participate in a defined on-call rotation supporting production systems, with clear escalation paths and expectations.
  • Lead incident response, root cause analysis, and postmortems for production issues.
  • Mentor and coach engineers on reliability engineering principles, operational ownership, and incident response best practices.
  • Review infrastructure changes, automation scripts, and reliability-impacting code changes to ensure production readiness.
  • Collaborate with software engineers to embed reliability, security, and operational best practices into development workflows.
View Full Description & ApplyYou'll be redirected to the employer's site
130000 - 150000 USD per year
Apply Now