Senior Site Reliability Engineer

Posted about 4 hours agoViewed

150000 - 200000 USD per year

United StatesFull-TimeSoftware Development

Company:Barti

Location:United States

Languages:English

Seniority level:Senior, 5+ years

Experience:5+ years

Skills:

AWSDockerLeadershipPostgreSQLPythonSQLBashCloud ComputingGCPKubernetesGrafanaPrometheusCI/CDProblem SolvingLinuxDevOpsTerraformMicroservicesNetworkingSoftware Engineering

Requirements:

5+ years of relevant work experience in Site Reliability Engineering, DevOps, or Infrastructure roles 1+ years of hands-on experience with either Python, Go, or Bash scripting Experience with cloud platforms (ideally GCP) and container orchestration (Kubernetes, Docker) Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, or similar) Strong understanding of Linux systems, networking, and distributed systems Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar) Excellent problem-solving and communication skills Able to work independently and as part of a team Background in healthcare technology or regulated industries (preferred) Experience with GCP, Cloud SQL, and Google Kubernetes Engine (GKE) (preferred) HIPAA compliance and security best practices experience (preferred) Experience with relational databases (Postgres, MySQL) performance tuning and high availability (preferred) Proficiency with CI/CD tools (GitHub Actions, CircleCI, GitLab CI) (preferred) Familiarity with APM tools and distributed tracing (preferred)

Responsibilities:

Lead and participate in the design, implementation, and maintenance of highly available and scalable infrastructure. Monitor system health, performance metrics, and capacity planning. Establish and track SLIs, SLOs, and error budgets. Design and implement Infrastructure as Code (IaC) solutions. Build and maintain CI/CD pipelines. Automate operational tasks. Lead incident response efforts. Debug and resolve complex production issues. Implement monitoring, alerting, and observability solutions. Provide technical leadership and mentorship. Collaborate with cross-functional teams. Lead the technical design of infrastructure solutions. Stay updated with emerging technologies. Propose and drive adoption of best practices. Conduct chaos engineering experiments and disaster recovery drills. Implement and maintain security best practices. Manage secrets, access controls, and security monitoring systems. Foster a collaborative environment. Clearly communicate technical concepts. Work closely with engineering teams to define reliability requirements.