Apply

Site Reliability Engineer

Posted 24 days agoViewed

View full description

πŸ’Ž Seniority level: Senior, 5+ years

πŸ“ Location: United States, Canada

πŸ’Έ Salary: 21000.0 - 26000.0 BRL per month

πŸ” Industry: Healthcare Technology

🏒 Company: Trusted Health

πŸ—£οΈ Languages: English

⏳ Experience: 5+ years

πŸͺ„ Skills: KubernetesRubyGrafanaPrometheusCI/CDTerraform

Requirements:
  • 5+ years experience in Site Reliability Engineering or related field
  • Extensive experience with Kubernetes (EKS)
  • Strong proficiency in Terraform
  • Hands-on experience with observability tools (Prometheus/Grafana)
  • Familiarity with Ruby scripting
  • Experience building CI/CD pipelines using Gitops
  • Understanding of infrastructure security best practices
  • Strong debugging skills in production environments
Responsibilities:
  • Maintain and enhance the hosting platform
  • Optimize CI/CD pipelines
  • Ensure observability across critical infrastructure
  • Collaborate with engineers to improve system reliability
Apply

Related Jobs

Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ’Έ 100000.0 - 120000.0 USD per year

πŸ” Financial Technology

🏒 Company: iCapitalπŸ‘₯ 51-100Business Intelligence

  • 5+ years of SRE or related experience with 3+ years in AWS
  • Strong experience with Kubernetes
  • Working knowledge of MongoDB, Postgres, DynamoDB
  • Experience defining and implementing SLOs/SLIs
  • Skills in IaC (Terraform preferred) and programming languages (Python, Ruby, Java)
  • Experience with modern observability practices (Prometheus, Grafana, etc.)
  • Strong incident response skills
  • Excellent problem-solving abilities
  • Design, implement, and maintain service level objectives (SLOs)
  • Develop observability strategies
  • Architect scalable infrastructure solutions
  • Drive automation initiatives
  • Champion reliability best practices
  • Design and operate Kubernetes environment
  • Lead incident response and postmortems
  • Participate in on-call rotations

AWSPostgreSQLPythonDynamoDBKubernetesMongoDBGrafanaPrometheusTerraform

Posted about 13 hours ago
Apply
Apply

πŸ“ California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington

🧭 Full-Time

πŸ’Έ 142500.0 - 180000.0 USD per year

πŸ” Software Development

  • 5+ years of experience as a DevOps or SRE.
  • Proficiency with AWS, Docker, and Kubernetes.
  • Experience with infrastructure as code (Terraform).
  • Architect and maintain services for continuous operation and compliance with SLAs.
  • Establish and manage a platform for product teams to deploy and monitor services.
  • Lead initiatives to improve operational processes and systems.

AWSDockerPostgreSQLPythonSQLKubernetesRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Posted about 17 hours ago
Apply
Apply

πŸ“ California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington

🧭 Full-Time

πŸ’Έ 142500.0 - 180000.0 USD per year

πŸ” Software Development

🏒 Company: VeriffπŸ‘₯ 501-1000πŸ’° $100,000,000 Series C about 3 years agoπŸ«‚ Last layoff over 1 year agoArtificial Intelligence (AI)Fraud DetectionInformation TechnologyCyber SecurityIdentity Management

  • 5+ years of experience as a DevOps or SRE
  • Strong knowledge of AWS, Docker, and Kubernetes
  • Proficient in infrastructure as code (Terraform)
  • Understanding of SRE principles for reliability and scalability
  • Experience with Linux, SQL/NoSQL databases, and microservices
  • Architect and maintain services for continuous operation
  • Establish and manage a platform for service deployment
  • Lead initiatives for operational excellence and process improvement
  • Ensure transparent communication and conduct postmortems
  • Develop and enhance CI/CD pipelines
  • Implement SRE best practices for monitoring and security

AWSDockerPostgreSQLPythonRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Posted about 20 hours ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Fetch

  • 1+ year(s) of experience in a software development-oriented role (e.g. Software Engineer, DevOps Engineer, Site Reliability Engineer)
  • Experience with one or more high-level programming languages (e.g. Java, Python, Go, C/C++)
  • Experience with cloud infrastructure (AWS strongly preferred)
  • Experience with containerization technologies (Docker, Kubernetes preferred)
  • Experience building CI/CD pipelines
  • Experience with Unix/Linux operating system internals and networking
  • Experience with analyzing and troubleshooting systems
  • Experience monitoring and supporting microservice architectures
  • Bachelor's or higher degree in Computer Science, related technical field, or equivalent practical experience
  • Engage in and improve the whole lifecycle of services - from inception and design, through deployment, operation, and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and readiness reviews
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems by participating in the on-call rotation
  • Build and support AWS multi-account and multi-region infrastructure using a mix of managed services (e.g. S3, Lambda, RDS, etc.) and containerized infrastructure (e.g. EKS, ECS)
  • Grow the SRE team by mentoring engineers and participating in the hiring process

AWSDockerPythonSoftware DevelopmentSQLAmazon RDSAWS EKSBashCloud ComputingElasticSearchGitJavaKubernetesAPI testingGoJava SpringCI/CDRESTful APIsLinuxTerraformMicroservicesTroubleshootingAnsibleScriptingDebugging

Posted 4 days ago
Apply
Apply

πŸ“ Canada

🧭 Full-Time

πŸ” Site Reliability Engineering

🏒 Company: JobgetherπŸ‘₯ 11-50πŸ’° $1,493,585 Seed almost 2 years agoInternet

  • 4+ years of experience in Site Reliability Engineering or similar role
  • Expertise in Infrastructure as Code with Terraform and Terragrunt
  • Deep knowledge of AWS cloud services
  • Experience with Confluent Cloud and Kafka for data streaming
  • Strong experience with Redis and RDS
  • Design, build, and maintain scalable cloud infrastructure using Terraform and Terragrunt
  • Manage AWS cloud environments for security and high availability
  • Oversee data streaming platforms with Confluent Cloud and Kafka
  • Maintain monitoring and alerting solutions using Prometheus and Grafana
  • Manage Kubernetes clusters with Helm, ArgoCD, and Istio

AWSElasticSearchKafkaKubernetesGrafanaPrometheusRedisCI/CDTerraform

Posted 4 days ago
Apply
Apply

πŸ“ United States, Europe

🧭 Full-Time

πŸ” Biotechnology

🏒 Company: InvertπŸ‘₯ 11-50πŸ’° $20,149,993 Seed 8 months agoData ManagementSaaSApplication Performance Management

  • Experience in cloud infrastructure
  • Strong incident management skills
  • Technical skills in software reliability
  • Design, build, and maintain scalable cloud infrastructure
  • Develop and enforce SLIs and SLOs
  • Create CI/CD pipelines
  • Lead Incident Management process

AWSDockerCI/CDLinuxTerraform

Posted 5 days ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ” Software Development

🏒 Company: Neon Inc.

  • 2+ years in an Engineering Management role
  • 5+ years of hands-on coding experience
  • Cloud experience with Azure and/or AWS
  • Strong knowledge of Kubernetes
  • Monitoring experience with Prometheus ecosystem
  • Excellent English communication skills
  • Manage a high-performing distributed team
  • Identify and eliminate obstacles
  • Coach and mentor engineers
  • Optimize processes and tech debt
  • Foster collaboration
  • Align projects with business goals
  • Maintain a scalable on-call process
  • Recruit and hire Software Engineers

AWSKubernetesAzureGoGrafanaPostgresPrometheusLinux

Posted 6 days ago
Apply
Apply

πŸ“ Canada

πŸ’Έ 147500.0 - 173500.0 CAD per year

πŸ” Software Development

🏒 Company: Life360πŸ‘₯ 251-500πŸ’° $33,038,258 Post-IPO Equity over 2 years agoπŸ«‚ Last layoff about 2 years agoAndroidFamilyAppsMobile AppsMobile

  • 3+ years of experience programming in Java, Python, or other formal programming language
  • Expert level experience (3+ years) managing medium to large-scale deployments on AWS (~5000 instances, 50+ accounts)
  • Strong Kubernetes experience (2+ years) deploying and managing at scale (100s of Deployments,10k+ containers, 20k+ Cores).
  • Strong Linux administration experience, shell/bash scripting.
  • Expert level experience with Infrastructure as code tools: Terraform, CloudFormation; config management/provisioning tools: Ansible, Chef, etc.
  • being opinionated on technical direction and strategy (and documenting those opinions for others to be able to follow),
  • leading and mentoring other engineers on the team
  • helping implement or diagnose the thorniest of the problems seen
  • Participate in shared on-call rotation (roughly one week every six weeks on call)
  • Estimate schedules, breaking tasks down to reasonable 1-3 day tasks.
  • Optimize for Cost Efficiency

AWSDockerPythonSQLBashCloud ComputingJavaJenkinsKafkaKubernetesREST APICI/CDLinuxTerraformMicroservicesNetworkingTroubleshootingAnsibleScripting

Posted 6 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” AI Infrastructure

🏒 Company: Voltage ParkπŸ‘₯ 1-10πŸ’° $500,000,000 over 1 year agoCloud ComputingMachine Learning

  • 8+ years working with Linux
  • 5+ years experience with AWS
  • 2+ years experience with Kubernetes
  • Experience with Terraform and Ansible
  • Experience with network attached storage management
  • Design and build new platforms
  • Deploy updates to support internal and customer use cases
  • Collaborate with network engineering, software development, and customer support
  • Participate in the SRE on-call rotation

AWSPythonBashKubernetesGoPrometheusLinuxTerraformNetworkingAnsible

Posted 7 days ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ’Έ 100000.0 - 120000.0 USD per year

πŸ” Software Development

🏒 Company: AssuredCloud Data ServicesB2BCloud SecurityCyber Security

  • Experience in a start-up environment
  • Design and maintain highly available database solutions, ideally PostgreSQL
  • Experience with compliance and security regulations (SOC 2, HIPPA, ISO 27001)
  • Strong engineering background
  • Knowledge of Node.js, Python, Docker, PostgreSQL, GraphQL (not required)
  • Provision infrastructure and tooling
  • Create automated tooling to maintain the platform
  • Build methods for monitoring and scaling services
  • Implement security compliance strategies
  • Lead and mentor engineering team

AWSDockerNode.jsPostgreSQLPythonTerraformCompliance

Posted 9 days ago
Apply