Site Reliability Engineer

Posted 24 days agoViewed

View full description

💎 Seniority level: Senior, 5+ years

📍 Location: United States, Canada

💸 Salary: 21000.0 - 26000.0 BRL per month

🔍 Industry: Healthcare Technology

🏢 Company: Trusted Health

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: KubernetesRubyGrafanaPrometheusCI/CDTerraform

Requirements:

5+ years experience in Site Reliability Engineering or related field
Extensive experience with Kubernetes (EKS)
Strong proficiency in Terraform
Hands-on experience with observability tools (Prometheus/Grafana)
Familiarity with Ruby scripting
Experience building CI/CD pipelines using Gitops
Understanding of infrastructure security best practices
Strong debugging skills in production environments

Responsibilities:

Maintain and enhance the hosting platform
Optimize CI/CD pipelines
Ensure observability across critical infrastructure
Collaborate with engineers to improve system reliability

Apply

Related Jobs

Apply

🔥 Site Reliability Engineer - Assistant Vice President

Posted about 13 hours ago

📍 United States, Canada

🧭 Full-Time

💸 100000.0 - 120000.0 USD per year

🔍 Financial Technology

🏢 Company: iCapital👥 51-100 Business Intelligence

🔧 Requirements

5+ years of SRE or related experience with 3+ years in AWS
Strong experience with Kubernetes
Working knowledge of MongoDB, Postgres, DynamoDB
Experience defining and implementing SLOs/SLIs
Skills in IaC (Terraform preferred) and programming languages (Python, Ruby, Java)
Experience with modern observability practices (Prometheus, Grafana, etc.)
Strong incident response skills
Excellent problem-solving abilities

💡 Responsibilities

Design, implement, and maintain service level objectives (SLOs)
Develop observability strategies
Architect scalable infrastructure solutions
Drive automation initiatives
Champion reliability best practices
Design and operate Kubernetes environment
Lead incident response and postmortems
Participate in on-call rotations

AWSPostgreSQLPythonDynamoDBKubernetesMongoDBGrafanaPrometheusTerraform

Posted about 13 hours ago

Apply

🔥 Site Reliability Engineer (SRE)

Posted about 17 hours ago

📍 California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington

🧭 Full-Time

💸 142500.0 - 180000.0 USD per year

🔍 Software Development

🔧 Requirements

5+ years of experience as a DevOps or SRE.
Proficiency with AWS, Docker, and Kubernetes.
Experience with infrastructure as code (Terraform).

💡 Responsibilities

Architect and maintain services for continuous operation and compliance with SLAs.
Establish and manage a platform for product teams to deploy and monitor services.
Lead initiatives to improve operational processes and systems.

AWSDockerPostgreSQLPythonSQLKubernetesRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Posted about 17 hours ago

Apply

🔥 Site Reliability Engineer (SRE)

Posted about 20 hours ago

📍 California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington

🧭 Full-Time

💸 142500.0 - 180000.0 USD per year

🔍 Software Development

🏢 Company: Veriff👥 501-1000💰 $100,000,000 Series C about 3 years ago🫂 Last layoff over 1 year agoArtificial Intelligence (AI)Fraud Detection Information Technology Cyber Security Identity Management

🔧 Requirements

5+ years of experience as a DevOps or SRE
Strong knowledge of AWS, Docker, and Kubernetes
Proficient in infrastructure as code (Terraform)
Understanding of SRE principles for reliability and scalability
Experience with Linux, SQL/NoSQL databases, and microservices

💡 Responsibilities

Architect and maintain services for continuous operation
Establish and manage a platform for service deployment
Lead initiatives for operational excellence and process improvement
Ensure transparent communication and conduct postmortems
Develop and enhance CI/CD pipelines
Implement SRE best practices for monitoring and security

AWSDockerPostgreSQLPythonRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Posted about 20 hours ago

Apply

🔥 Senior Site Reliability Engineer

Posted 4 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Fetch

🔧 Requirements

1+ year(s) of experience in a software development-oriented role (e.g. Software Engineer, DevOps Engineer, Site Reliability Engineer)
Experience with one or more high-level programming languages (e.g. Java, Python, Go, C/C++)
Experience with cloud infrastructure (AWS strongly preferred)
Experience with containerization technologies (Docker, Kubernetes preferred)
Experience building CI/CD pipelines
Experience with Unix/Linux operating system internals and networking
Experience with analyzing and troubleshooting systems
Experience monitoring and supporting microservice architectures
Bachelor's or higher degree in Computer Science, related technical field, or equivalent practical experience

💡 Responsibilities

Engage in and improve the whole lifecycle of services - from inception and design, through deployment, operation, and refinement
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and readiness reviews
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
Practice sustainable incident response and blameless postmortems by participating in the on-call rotation
Build and support AWS multi-account and multi-region infrastructure using a mix of managed services (e.g. S3, Lambda, RDS, etc.) and containerized infrastructure (e.g. EKS, ECS)
Grow the SRE team by mentoring engineers and participating in the hiring process

AWSDockerPythonSoftware DevelopmentSQLAmazon RDSAWS EKSBashCloud ComputingElasticSearchGitJavaKubernetesAPI testingGoJava SpringCI/CDRESTful APIsLinuxTerraformMicroservicesTroubleshootingAnsibleScriptingDebugging

Posted 4 days ago

Apply

🔥 Site Reliability Engineer - (Remote - Canada)

Posted 4 days ago

📍 Canada

🧭 Full-Time

🔍 Site Reliability Engineering

🏢 Company: Jobgether👥 11-50💰 $1,493,585 Seed almost 2 years agoInternet

🔧 Requirements

4+ years of experience in Site Reliability Engineering or similar role
Expertise in Infrastructure as Code with Terraform and Terragrunt
Deep knowledge of AWS cloud services
Experience with Confluent Cloud and Kafka for data streaming
Strong experience with Redis and RDS

💡 Responsibilities

Design, build, and maintain scalable cloud infrastructure using Terraform and Terragrunt
Manage AWS cloud environments for security and high availability
Oversee data streaming platforms with Confluent Cloud and Kafka
Maintain monitoring and alerting solutions using Prometheus and Grafana
Manage Kubernetes clusters with Helm, ArgoCD, and Istio

AWSElasticSearchKafkaKubernetesGrafanaPrometheusRedisCI/CDTerraform

Posted 4 days ago

Apply

🔥 Senior Site Reliability Engineer

Posted 5 days ago

📍 United States, Europe

🧭 Full-Time

🔍 Biotechnology

🏢 Company: Invert👥 11-50💰 $20,149,993 Seed 8 months agoData Management SaaS Application Performance Management

🔧 Requirements

Experience in cloud infrastructure
Strong incident management skills
Technical skills in software reliability

💡 Responsibilities

Design, build, and maintain scalable cloud infrastructure
Develop and enforce SLIs and SLOs
Create CI/CD pipelines
Lead Incident Management process

AWSDockerCI/CDLinuxTerraform

Posted 5 days ago

Apply

🔥 Lead Site Reliability Engineer

Posted 6 days ago

📍 United States, Canada

🧭 Full-Time

🔍 Software Development

🏢 Company: Neon Inc.

🔧 Requirements

2+ years in an Engineering Management role
5+ years of hands-on coding experience
Cloud experience with Azure and/or AWS
Strong knowledge of Kubernetes
Monitoring experience with Prometheus ecosystem
Excellent English communication skills

💡 Responsibilities

Manage a high-performing distributed team
Identify and eliminate obstacles
Coach and mentor engineers
Optimize processes and tech debt
Foster collaboration
Align projects with business goals
Maintain a scalable on-call process
Recruit and hire Software Engineers

AWSKubernetesAzureGoGrafanaPostgresPrometheusLinux

Posted 6 days ago

Apply

🔥 Senior II Site Reliability Engineer, Infrastructure

Posted 6 days ago

📍 Canada

💸 147500.0 - 173500.0 CAD per year

🔍 Software Development

🏢 Company: Life360👥 251-500💰 $33,038,258 Post-IPO Equity over 2 years ago🫂 Last layoff about 2 years agoAndroid Family Apps Mobile Apps Mobile

🔧 Requirements

3+ years of experience programming in Java, Python, or other formal programming language
Expert level experience (3+ years) managing medium to large-scale deployments on AWS (~5000 instances, 50+ accounts)
Strong Kubernetes experience (2+ years) deploying and managing at scale (100s of Deployments,10k+ containers, 20k+ Cores).
Strong Linux administration experience, shell/bash scripting.
Expert level experience with Infrastructure as code tools: Terraform, CloudFormation; config management/provisioning tools: Ansible, Chef, etc.

💡 Responsibilities

being opinionated on technical direction and strategy (and documenting those opinions for others to be able to follow),
leading and mentoring other engineers on the team
helping implement or diagnose the thorniest of the problems seen
Participate in shared on-call rotation (roughly one week every six weeks on call)
Estimate schedules, breaking tasks down to reasonable 1-3 day tasks.
Optimize for Cost Efficiency

AWSDockerPythonSQLBashCloud ComputingJavaJenkinsKafkaKubernetesREST APICI/CDLinuxTerraformMicroservicesNetworkingTroubleshootingAnsibleScripting

Posted 6 days ago

Apply

🔥 Site Reliability Engineer

Posted 7 days ago

📍 United States

🧭 Full-Time

🔍 AI Infrastructure

🏢 Company: Voltage Park👥 1-10💰 $500,000,000 over 1 year agoCloud Computing Machine Learning

🔧 Requirements

8+ years working with Linux
5+ years experience with AWS
2+ years experience with Kubernetes
Experience with Terraform and Ansible
Experience with network attached storage management

💡 Responsibilities

Design and build new platforms
Deploy updates to support internal and customer use cases
Collaborate with network engineering, software development, and customer support
Participate in the SRE on-call rotation

AWSPythonBashKubernetesGoPrometheusLinuxTerraformNetworkingAnsible

Posted 7 days ago

Apply

🔥 Senior Site Reliability Engineer

Posted 9 days ago

📍 United States, Canada

🧭 Full-Time

💸 100000.0 - 120000.0 USD per year

🔍 Software Development

🏢 Company: AssuredCloud Data Services B2B Cloud Security Cyber Security

🔧 Requirements