Apply

Site Reliability Engineer

Posted 14 days agoViewed

View full description

πŸ’Ž Seniority level: Middle, 4+ years

πŸ“ Location: United States, Canada

πŸ” Industry: Blockchain, Crypto Infrastructure

🏒 Company: AnagramπŸ‘₯ 11-50πŸ’° $9,100,000 Series A almost 5 years agoInternetEyewearB2BInsurTechEnterprise SoftwareInsuranceHealth CareSoftware

πŸ—£οΈ Languages: English

⏳ Experience: 4+ years

πŸͺ„ Skills: AWSPythonBashBlockchainGCPKubernetesAzureGoRustTerraform

Requirements:
  • 4+ years experience in DevOps, SRE, or Backend Engineering
  • Strong coding skills in Python, Go, Rust, or Bash
  • Experience with AWS, GCP, or Azure
  • Expertise in blockchain node infrastructure is a plus
  • Familiarity with Web3 security considerations
Responsibilities:
  • Design and implement scalable infrastructure using Terraform and Kubernetes
  • Develop monitoring, logging, and alerting solutions
  • Build and optimize CI/CD pipelines
  • Identify and resolve performance bottlenecks
  • Implement and enforce security policies
Apply

Related Jobs

Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Fetch

  • 1+ year(s) of experience in a software development-oriented role (e.g. Software Engineer, DevOps Engineer, Site Reliability Engineer)
  • Experience with one or more high-level programming languages (e.g. Java, Python, Go, C/C++)
  • Experience with cloud infrastructure (AWS strongly preferred)
  • Experience with containerization technologies (Docker, Kubernetes preferred)
  • Experience building CI/CD pipelines
  • Experience with Unix/Linux operating system internals and networking
  • Experience with analyzing and troubleshooting systems
  • Experience monitoring and supporting microservice architectures
  • Bachelor's or higher degree in Computer Science, related technical field, or equivalent practical experience
  • Engage in and improve the whole lifecycle of services - from inception and design, through deployment, operation, and refinement
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and readiness reviews
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems by participating in the on-call rotation
  • Build and support AWS multi-account and multi-region infrastructure using a mix of managed services (e.g. S3, Lambda, RDS, etc.) and containerized infrastructure (e.g. EKS, ECS)
  • Grow the SRE team by mentoring engineers and participating in the hiring process

AWSDockerPythonSoftware DevelopmentSQLAmazon RDSAWS EKSBashCloud ComputingElasticSearchGitJavaKubernetesAPI testingGoJava SpringCI/CDRESTful APIsLinuxTerraformMicroservicesTroubleshootingAnsibleScriptingDebugging

Posted 1 day ago
Apply
Apply

πŸ“ Canada

🧭 Full-Time

πŸ” Site Reliability Engineering

🏒 Company: JobgetherπŸ‘₯ 11-50πŸ’° $1,493,585 Seed almost 2 years agoInternet

  • 4+ years of experience in Site Reliability Engineering or similar role
  • Expertise in Infrastructure as Code with Terraform and Terragrunt
  • Deep knowledge of AWS cloud services
  • Experience with Confluent Cloud and Kafka for data streaming
  • Strong experience with Redis and RDS
  • Design, build, and maintain scalable cloud infrastructure using Terraform and Terragrunt
  • Manage AWS cloud environments for security and high availability
  • Oversee data streaming platforms with Confluent Cloud and Kafka
  • Maintain monitoring and alerting solutions using Prometheus and Grafana
  • Manage Kubernetes clusters with Helm, ArgoCD, and Istio

AWSElasticSearchKafkaKubernetesGrafanaPrometheusRedisCI/CDTerraform

Posted 2 days ago
Apply
Apply

πŸ“ United States, Europe

🧭 Full-Time

πŸ” Biotechnology

🏒 Company: InvertπŸ‘₯ 11-50πŸ’° $20,149,993 Seed 8 months agoData ManagementSaaSApplication Performance Management

  • Experience in cloud infrastructure
  • Strong incident management skills
  • Technical skills in software reliability
  • Design, build, and maintain scalable cloud infrastructure
  • Develop and enforce SLIs and SLOs
  • Create CI/CD pipelines
  • Lead Incident Management process

AWSDockerCI/CDLinuxTerraform

Posted 3 days ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ” Software Development

🏒 Company: Neon Inc.

  • 2+ years in an Engineering Management role
  • 5+ years of hands-on coding experience
  • Cloud experience with Azure and/or AWS
  • Strong knowledge of Kubernetes
  • Monitoring experience with Prometheus ecosystem
  • Excellent English communication skills
  • Manage a high-performing distributed team
  • Identify and eliminate obstacles
  • Coach and mentor engineers
  • Optimize processes and tech debt
  • Foster collaboration
  • Align projects with business goals
  • Maintain a scalable on-call process
  • Recruit and hire Software Engineers

AWSKubernetesAzureGoGrafanaPostgresPrometheusLinux

Posted 4 days ago
Apply
Apply

πŸ“ Canada

πŸ’Έ 147500.0 - 173500.0 CAD per year

πŸ” Software Development

🏒 Company: Life360πŸ‘₯ 251-500πŸ’° $33,038,258 Post-IPO Equity over 2 years agoπŸ«‚ Last layoff about 2 years agoAndroidFamilyAppsMobile AppsMobile

  • 3+ years of experience programming in Java, Python, or other formal programming language
  • Expert level experience (3+ years) managing medium to large-scale deployments on AWS (~5000 instances, 50+ accounts)
  • Strong Kubernetes experience (2+ years) deploying and managing at scale (100s of Deployments,10k+ containers, 20k+ Cores).
  • Strong Linux administration experience, shell/bash scripting.
  • Expert level experience with Infrastructure as code tools: Terraform, CloudFormation; config management/provisioning tools: Ansible, Chef, etc.
  • being opinionated on technical direction and strategy (and documenting those opinions for others to be able to follow),
  • leading and mentoring other engineers on the team
  • helping implement or diagnose the thorniest of the problems seen
  • Participate in shared on-call rotation (roughly one week every six weeks on call)
  • Estimate schedules, breaking tasks down to reasonable 1-3 day tasks.
  • Optimize for Cost Efficiency

AWSDockerPythonSQLBashCloud ComputingJavaJenkinsKafkaKubernetesREST APICI/CDLinuxTerraformMicroservicesNetworkingTroubleshootingAnsibleScripting

Posted 4 days ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ” Software Development

🏒 Company: trivagoπŸ‘₯ 1001-5000πŸ’° $52,541,981 Private about 14 years agoπŸ«‚ Last layoff almost 5 years agoInternetHospitalityMarketingInformation TechnologyHotelTravel

  • Over 5 years of expertise in SecDevOps or Cyber Security
  • Bachelor's degree in Computer Science or related field
  • Strong understanding of security frameworks and regulations
  • Proficiency in programming languages like Java, Kotlin, or Python
  • Good understanding of web application security principles
  • Develop and deploy hybrid cloud and on-premises solutions
  • Collaborate across security domains
  • Inspire engineers in secure design and operation
  • Raise security awareness company-wide
  • Identify cloud security needs and shape strategy

DockerPostgreSQLPythonCybersecurityGCPJavaKafkaKotlinKubernetesMySQLTerraformAnsible

Posted 5 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” AI Infrastructure

🏒 Company: Voltage Park

  • 8+ years working with Linux
  • 5+ years experience with AWS
  • 2+ years experience with Kubernetes
  • Experience with Terraform and Ansible
  • Experience with network attached storage management
  • Design and build new platforms
  • Deploy updates to support internal and customer use cases
  • Collaborate with network engineering, software development, and customer support
  • Participate in the SRE on-call rotation

AWSPythonBashKubernetesGoPrometheusLinuxTerraformNetworkingAnsible

Posted 5 days ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ’Έ 100000.0 - 120000.0 USD per year

πŸ” Software Development

🏒 Company: AssuredCloud Data ServicesB2BCloud SecurityCyber Security

  • Experience in a start-up environment
  • Design and maintain highly available database solutions, ideally PostgreSQL
  • Experience with compliance and security regulations (SOC 2, HIPPA, ISO 27001)
  • Strong engineering background
  • Knowledge of Node.js, Python, Docker, PostgreSQL, GraphQL (not required)
  • Provision infrastructure and tooling
  • Create automated tooling to maintain the platform
  • Build methods for monitoring and scaling services
  • Implement security compliance strategies
  • Lead and mentor engineering team

AWSDockerNode.jsPostgreSQLPythonTerraformCompliance

Posted 7 days ago
Apply
Apply

πŸ“ USA

🧭 Full-Time

πŸ” Software Development

🏒 Company: DandyπŸ‘₯ 501-1000Food and BeverageFood Processing

  • 5+ years of software engineering experience, preferably in a high growth startup environment
  • An expert in Google Cloud Platform and Google Kubernetes Engine
  • Experience with infrastructure as code platforms (Terraform, Pulumi)
  • Experience creating and maintaining fully automated CI/CD build processes for multiple environments
  • Experience designing the architecture and automation of infrastructure within a cloud environment
  • Develop and maintain infrastructure, systems, and tooling to support Dandy’s products in a secure, well-tested, and performant way.
  • Reinvent an analog experience and disrupt a legacy industry through novel and scalable system design.
  • Collaborate with Product Engineers and other stakeholders within Engineering, Product and Data to maintain a high bar for quality in a fast-paced, iterative environment.
  • Advocate for improvements to infrastructure quality, security, and performance.
  • Craft code that meets our internal standards for style, maintainability, and best practices.
  • Recognize impediments to our efficiency as a team ("technical debt"), propose and implement solutions.

GraphQLNode.jsPostgreSQLCloud ComputingGCPKubernetesTypeScriptNest.jsCI/CDDevOpsTerraformSoftware Engineering

Posted 8 days ago
Apply
Apply

πŸ“ Canada

πŸ” Software Development

🏒 Company: JobgetherπŸ‘₯ 11-50πŸ’° $1,493,585 Seed almost 2 years agoInternet

  • 4+ years of experience in Site Reliability Engineering or a similar role with a strong focus on cloud infrastructure.
  • Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.
  • Deep knowledge of AWS cloud services and best practices for scalable and secure architectures.
  • Hands-on experience with Confluent Cloud and Kafka for distributed data streaming.
  • Strong experience with Redis for caching and RDS for data storage.
  • Proficiency with OpenSearch/ElasticSearch/ChaosSearch for search and analytics.
  • Advanced knowledge of monitoring tools like Prometheus, Grafana, Alert Manager, and OpsGenie.
  • Experience with LaunchDarkly for feature flag management.
  • Extensive experience managing Kubernetes clusters, including Helm for package management, ArgoCD for deployments, and Istio for service mesh configurations.
  • Familiarity with Kustomize for Kubernetes resource configuration.
  • Strong problem-solving skills and ability to troubleshoot complex systems in production environments.
  • Excellent communication and collaboration skills within agile teams.
  • Design, build, and maintain highly scalable cloud infrastructure using Terraform and Terragrunt for automated resource provisioning.
  • Manage and optimize AWS cloud environments, ensuring security, cost efficiency, and high availability.
  • Oversee data streaming platforms using Confluent Cloud and Kafka, ensuring reliable data pipelines.
  • Deploy and manage Redis instances for caching and real-time data processing.
  • Implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie.
  • Enable feature flag management and controlled rollouts with LaunchDarkly.
  • Manage Kubernetes clusters, utilizing Helm, ArgoCD, Istio, and Kustomize for continuous deployment and infrastructure-as-code practices.
  • Collaborate with development teams to integrate new services into the infrastructure seamlessly.
  • Troubleshoot complex system issues to maintain high availability and performance.
  • Continuously improve automation tools, processes, and methodologies to enhance system scalability.

AWSAmazon RDSKafkaKubernetesGrafanaPrometheusRedisCI/CDProblem SolvingTerraform

Posted 9 days ago
Apply