Apply

Site Reliability Engineer

Posted 19 days agoViewed

View full description

πŸ’Ž Seniority level: Senior, 5+ years

πŸ“ Location: AMER, EMEA, APAC

πŸ” Industry: Blockchain

🏒 Company: asymmetric.re

⏳ Experience: 5+ years

πŸͺ„ Skills: AWSDockerPythonBlockchainCloud ComputingKubernetesRustCI/CDRESTful APIsLinuxDevOpsTerraformMicroservicesNetworkingTroubleshootingAnsibleScripting

Requirements:
  • Excellent experience managing Linux and network infrastructure.
  • Experience with load balancers and other high-availability technologies (e.g., HAproxy, ALB/ELB, etc.)
  • Prior experience with configuration management tooling (eg. Ansible, Chef, Puppet, Saltstack, etc.)
  • Excellent troubleshooting fundamentals on both hardware and software.
  • Development experience in Golang, Python, or Rust.
  • Experience with continuous integration pipelines and automated deployments
  • Experience OSS monitoring tools (eg. Grafana, Loki, Prometheus, Alertmanager)
Responsibilities:
  • Manage a globally distributed fleet of blockchain infrastructure services
  • Deploy infrastructure as code deployments to both dev, staging, and production environments
  • Work in a globally distributed high performing team to deliver mission-critical services to the financial sector.
  • Design, Architect, Deploy, and Manage blockchain infrastructure services.
  • Adhere to the highest standards of integrity, trust, and professionalism.
Apply

Related Jobs

Apply

πŸ“ United States, Canada, Mexico

🧭 Full-Time

πŸ” Software Development

🏒 Company: Fleetio

  • 5+ years of Ruby/Rail Experience
  • 3+ years of AWS Experience
  • Kubernetes experience
  • Experience with profiling and benchmarking source code
  • Effective at code review, and identifying potential performance problems before they reach production
  • Experience with Datadog or other APM tools
  • Excellent written and verbal communication skills
  • Proactively identify, triage, and resolve performance issues
  • Enhance system observability by monitoring performance metrics across Ruby, Rails, and database systems, including SLOs and SLIs
  • Guide product engineers on Ruby/Rails performance and database best practices through code reviews and pair programming
  • Optimize performance through instance configuration and monitoring
  • Collaborate with other SREs to proactively identify and address performance bottlenecks
  • Lead database capacity planning and upgrade initiatives
  • Manage the database-specific components of disaster recovery planning and execution
  • Oversee backup systems and pre-production databases
  • Create and maintain infrastructure and operations documentation
  • Participate in the on-call rotation

AWSPostgreSQLSQLCloud ComputingKubernetesRubyRuby on RailsCI/CDTerraform

Posted 3 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 95000.0 - 160000.0 USD per year

πŸ” Cybersecurity

🏒 Company: crowdstrikecareers

  • 5-7+ years of experience in Site Reliability Engineering (SRE), DevOps, or Cloud Infrastructure roles.
  • Experience managing Virtual Desktop Infrastructure (VDI) solutions such as Citrix, VMware Horizon, or AWS WorkSpaces.
  • Hands-on experience with AWS GovCloud (Azure/GCP is a plus).
  • Strong expertise in Infrastructure as Code (Terraform, CloudFormation).
  • Experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK, Datadog, Splunk).
  • Expertise in IAM and PAM solutions such as Okta, CyberArk, or AWS IAM.
  • Strong scripting and automation skills (Python, Bash, PowerShell).
  • Experience with CI/CD pipelines and DevOps workflows.
  • Familiarity with FedRAMP, NIST 800-53, DoD IL 4/5 compliance standards.
  • Hands-on experience with VDI management, performance tuning, and security hardening.
  • Architect, deploy, and maintain highly available, scalable, and secure systems in AWS GovCloud (Azure and GCP experience is a plus).
  • Automate infrastructure provisioning, scaling, and failover using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
  • Implement SLOs, SLIs, and error budgets to drive reliability improvements.
  • Optimize cloud infrastructure for performance, cost-efficiency, and resilience while adhering to compliance requirements.
  • Manage and optimize Virtual Desktop Infrastructure (VDI) solutions to ensure seamless user experience, performance, and security.
  • Deploy and manage monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK).
  • Implement automated self-healing mechanisms and proactive monitoring solutions.
  • Lead incident response, postmortems, and root cause analysis (RCA) to prevent future system disruptions.
  • Ensure 24/7 system uptime through on-call rotation and escalation handling.
  • Implement Identity and Access Management (IAM) best practices, including SSO, MFA, and RBAC across cloud environments.
  • Automate IAM governance and Privileged Access Management (PAM) to enforce the principle of least privilege.
  • Ensure audit readiness by maintaining accurate security configurations, logs, and compliance reports.
  • Work with security teams to align IAM and Zero Trust Architecture (ZTA) strategies with organizational policies.
  • Develop and maintain CI/CD pipelines for automated deployments and configuration management.
  • Use Python, Bash, or PowerShell to automate routine SRE workflows and security compliance checks.
  • Implement immutable infrastructure and support DevSecOps best practices.
  • Manage and optimize VDI environments, ensuring seamless DevOps integration for development and operational teams.
  • Contribute to chaos engineering and failure injection testing to enhance system resiliency.
  • Work closely with DevOps, IT Security, and Compliance teams to ensure system integrity and uptime.
  • Provide mentorship to junior engineers and contribute to knowledge-sharing initiatives.
  • Participate in architectural discussions and help drive improvements in cloud reliability and security posture.

AWSDockerPythonBashCloud ComputingCybersecurityGCPKubernetesAzureGrafanaPrometheusCI/CDLinuxDevOpsTerraformComplianceAnsibleScripting

Posted 4 days ago
Apply
Apply

πŸ“ United Kingdom

πŸ” Blockchain

🏒 Company: IO Global

  • Proficiency in Python, Bash, Terraform, Nix for DevOps services.
  • Extensive experience with AWS, specifically with services like EKS and RDS.
  • Familiarity with Container orchestration (e.g. Kubernetes) is essential.
  • Hands-on experience with PostgreSQL and its deployment on RDS.
  • Knowledge of monitoring tools (e.g., Prometheus, Grafana, Loki).
  • Solid troubleshooting and performance tuning capabilities.
  • Exceptional communication skills and team collaboration ethic.
  • Experience with CI/CD (e.g. Github Actions, Hydra, Earthly).
  • Design, write, and deliver tools and software primarily using Python, Bash, Terraform or Nix to improve the availability, scalability, and efficiency of our services.
  • Engage in and refine the whole lifecycle of services, from inception and design, through deployment, operation, and continuous improvement.
  • Practice sustainable incident response and promote blameless postmortems.
  • Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind.
  • Analyze system performance and reliability, offering recommendations for enhancement.
  • Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services.
  • Participate in on-call rotations, responding to and mitigating service interruptions and technical challenges.

AWSPostgreSQLPythonAmazon RDSAWS EKSBashKubernetesGrafanaPrometheusCI/CDDevOpsTerraform

Posted 5 days ago
Apply
Apply

πŸ“ France, Germany, Spain, United Kingdom, United States, Canada

🧭 Full-Time

πŸ” Software Development

🏒 Company: Platform.shπŸ‘₯ 251-500πŸ’° $140,000,000 Series D almost 3 years agoInternetOpen SourcePaaSCloud ManagementSoftware

  • DevOps, Cloud Operations, or SRE Expertise: A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Advanced Linux Internals Expertise: Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
  • Programming Languages: Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
  • Scripting Skills: Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
  • Cloud Infrastructure Knowledge: Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Containerization and Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
  • Problem-Solving and Collaboration: Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.
  • Refine Monitoring and Observability: Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
  • Automate Deployments and Workflows: Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
  • Optimize CI/CD Pipelines: Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
  • Cloud Infrastructure Management: Help scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt and operational complexity.
  • Incident Response and Post-Mortem: Support incident management and lead post-mortem analysis, ensuring continuous improvement and knowledge sharing.
  • Collaborate with Cross-Functional Teams: Work closely with engineering and product teams to integrate reliability practices into the development lifecycle and prioritize reliability efforts.
  • Drive Technical Innovation: Introduce and champion new tools, technologies, and practices that improve system reliability, performance, and scalability.

AWSDockerPythonBashCloud ComputingGCPKubernetesAzureGoGrafanaPrometheusCI/CDProblem SolvingRESTful APIsLinuxDevOpsTerraformAnsibleScripting

Posted 5 days ago
Apply
Apply

πŸ“ United States

πŸ’Έ 140000.0 - 160000.0 USD per year

  • Expertise with multi-region deployments in public cloud environments
  • Demonstrable production Kubernetes experience (Managed Kubernetes, Helm, kubectl, kOps, etc)
  • Strong background in Reliability Engineering, DevOps, Software Engineering
  • Fluency with least one programming language, such as C#, Python, Go, etc
  • Experience with cloud deployment and automation tools/methodologies (i.e. GitOps, Terraform, Pulumi)
  • Proficiency using source control such as Git.
  • Ability to maintain discretion, handle sensitive information, and improve security best-practices
  • Take ownership of the Bitwarden cloud infrastructure, with an emphasis on quality that translates directly to user delight
  • Evaluate current infrastructure and, on a regular basis, make recommendations for reliability, security, availability, scalability and cost management
  • Implement site reliability tools, monitoring, early warning and alert systems, and observability across Bitwarden cloud environments
  • Respond to infrastructure based outages; participate and contribute to ongoing strategy for 24x7 support (There is an on-call rotation with a weekend shift every 5-6 weeks)
  • Architectural designs and engineering operations at scale
  • Active participation in code reviews, learning and spreading technical knowledge
  • Contribute and mature incident management/escalation processes
  • Collaborate with cross functional teams to refine priorities and deliverables
  • Ongoing engagement with product owners to align SLI/SLOs/SLAs
  • Evaluate and identify opportunities for new initiatives to support organizational needs
  • Evolve and influence Bitwarden's SDLC as we scale
  • Provide mentorship to team mates

AWSDockerPythonCloud ComputingGitKubernetesCI/CDRESTful APIsLinuxDevOpsTerraformSoftware EngineeringSaaS

Posted 5 days ago
Apply
Apply

πŸ“ France, Germany, Spain, the United Kingdom, West Coast in the United States, Canada

🧭 Full-Time

πŸ” Software Development

🏒 Company: Remote Woman

  • A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
  • Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
  • Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
  • Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
  • Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.
  • Refine Monitoring and Observability
  • Automate Deployments and Workflows
  • Optimize CI/CD Pipelines
  • Cloud Infrastructure Management
  • Incident Response and Post-Mortem
  • Collaborate with Cross-Functional Teams
  • Drive Technical Innovation

AWSDockerPythonBashCloud ComputingGCPKubernetesAzureGoGrafanaPrometheusCollaborationCI/CDProblem SolvingLinuxDevOpsTerraformAnsibleScripting

Posted 5 days ago
Apply
Apply

πŸ“ USA

🧭 Full-Time

πŸ’Έ 186065.0 - 218900.0 USD per year

πŸ” Software Development

🏒 Company: Coinbase Careers PageπŸ‘₯ 1000-5000

  • 5+ years of experience building, iterating upon, and maintaining corporate IAM systems
  • 5+ years of experience with operational procedures and application development
  • Deep domain-knowledge with prominent cloud identity provider(s): Okta, Duo, Google Workspace, Azure AD, Ping, etc.
  • Demonstrated success developing and implementing toolings that solves problems related to: identity lifecycle and provisioning, SSO, MFA, ABAC, RBAC, directory services, zero trust networking, PAM, PIM, and secrets management
  • Experience configuring and implementing modern open source tooling such as: Terraform, Ansible, Kubernetes, Docker
  • Fluency in a modern programming language (Golang, Python, Ruby, Java, C# etc.)
  • Strong experience using and managing AWS, GCP, Azure, or other cloud environment with IaC
  • Strong understanding of CI/CD workflows, automation frameworks, and best practices
  • Clear communicationβ€”demonstrate ability to explain technical concepts simply
  • Self starterβ€”possess a continuous learning mindset
  • Demonstrate critical thinking under pressure
  • Engage in a dynamic role that combines traditional operations responsibilities and active contributions to the development and deployment of cloud-native applications, fostering a DevOps culture that emphasizes collaboration and automation
  • Partner across Coinbase to design, implement, and maintain performant, reliable, and secure system architectures
  • Provide corporate IAM and DevOps tooling subject matter expertise to adjacent IT, Security, and Engineering teams
  • Implement automation tooling and scripts to eliminate manual, repetitive tasks and reduce inefficiencies in system operations
  • Create comprehensive documentation and runbooks that detail system configurations, operational procedures, and troubleshooting steps across system lifecycle
  • Build and maintain CI/CD pipelines for integrating changes and deploying to production in progressively tested environments
  • Deliver configurations and maintain state using configuration management tools
  • Facilitate incident response, conduct root cause analysis, and blameless retrospectives
  • Define metrics and bolster monitoring/observability across corporate IAM systems
  • Participate in regular on-call rotation to ensure 24x7 uptime for critical systems

AWSDockerPythonCloud ComputingKubernetesLDAPCI/CDRESTful APIsLinuxDevOpsTerraformAnsibleScripting

Posted 6 days ago
Apply
Apply

πŸ“ France

🏒 Company: SinchπŸ‘₯ 1001-5000πŸ’° $48,845,918 Post-IPO Debt 7 months agoMessagingSaaSTelecommunicationsMobileSoftware

  • Background in infrastructure, operations, or software engineering.
  • Experience with cloud providers such as GCP.
  • Proficiency in configuration management tools such as Terraform and Ansible.
  • Hands-on proficiency with modern monitoring tools like Prometheus and Grafana.
  • Experience with distributed data stores such as Cassandra, PostgreSQL, and ElasticSearch.
  • Experience with Python and Bash is beneficial.
  • Strong technical skills across various infrastructure technologies.
  • Strong communication skills.
  • Experience operating and maintaining production systems in a Linux and public cloud environment.
  • Partner with product engineering teams to identity systems requirements.
  • Build and support our cloud-based infrastructure.
  • Automate routine processes and remediation tasks.
  • Develop, monitor and track Service Level Objectives (SLOs) for the systems under management.
  • Proactively troubleshoot, resolve, and plan for issues that typically come from support staff, other engineering teams, and our automated monitoring system.
  • Ensure our datastores are healthy and operate at optimal performance levels.
  • Contribute to the growth and culture of our engineering team.

DockerPostgreSQLPythonBashElasticSearchGCPKubernetesCassandraGrafanaPrometheusLinuxTerraformAnsible

Posted 6 days ago
Apply
Apply

πŸ“ Americas, EU, UK

πŸ” Cryptocurrency

🏒 Company: AurosπŸ‘₯ 11-50πŸ’° $17,000,000 about 2 years agoCryptocurrency

  • An SRE/DevOps professional with experience managing and optimising Linux systems in a high-performance 24 x 7 environment.
  • Cloud management using IaC, with experience in AWS, Azure or Google Cloud.
  • A background in container management, deployment, and orchestration. Kubernetes experience is good to have, strong docker skills are required.
  • Knowledge and experience in managing configuration at scale.
  • Experience with CI/CD pipeline, version control best practices.
  • Experience with application and infrastructure instrumentation using tools like Prometheus, OpenTelemetry and eBPF.
  • Strong knowledge of cloud security and IAM policies is required.
  • SIEM and threat management experience.
  • Must know how to secure Mac and Linux endpoints.
  • Python and bash experience is a must.
  • Participate in on-call roster to support our trading operations.
  • Maintain and improve our global infrastructure with high performance and reliability requirements.
  • Improve and update the security infrastructure of a widely distributed company that operates in a high-risk environment.
  • Engage and collaborate with other teams around system layout, rollout procedures and improving DevOps processes.
  • Development of internal tools and automation to accomplish the team’s goals.
  • Application tuning and troubleshooting; you will keep abreast of changes to trading system features and deployment, providing guidance to developers looking to improve their application performance or reliability.
  • Active participation in various trading and infrastructure projects.
  • Work closely with developers, traders and other staff to accomplish our firm’s goals.

AWSDockerPythonBashCloud ComputingCybersecurityGCPKubernetesAzurePrometheusCI/CDLinuxDevOpsTerraformAnsible

Posted 6 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 175000.0 - 220000.0 USDC per year

πŸ” Software Development

🏒 Company: OrcaπŸ‘₯ 11-50πŸ’° $18,000,000 Series A over 3 years agoCryptocurrencyBlockchainOnline PortalsInformation Technology

  • A strong track record of working on high-performance, scalable systems with expertise in release engineering, infrastructure, and operations.
  • Extensive experience with AWS services (e.g., ECS, copilot, Cloudwatch) and the ability to troubleshoot and optimize cloud-based systems.
  • Hands-on experience with tools like GitHub Action for reliable and efficient deployment workflows.
  • Familiarity with tools like Datadog to build actionable monitoring and alerting systems.
  • Proficiency in infrastructure-as-code tools like Terraform, and containerization tools like Docker. Experience with orchestrators like Kubernetes or Airflow is a plus.
  • Comfortable working independently in an async environment while collaborating effectively with a team. You understand trade-offs and advocate for pragmatic solutions.
  • Familiarity with Decentralized Finance (DeFi) concepts, AMMs, and the Solana ecosystem is a plus but not required.
  • Design, manage, and optimize AWS infrastructure with a focus on scalability, reliability, and cost efficiency.
  • Triage and resolve critical infrastructure issues proactively.
  • Build and refine CI/CD processes using modern tools, ensuring seamless, secure, and efficient deployments.
  • Develop robust monitoring, logging, and alerting systems using tools like Datadog or Grafana to improve visibility and system performance.
  • Architect systems that handle growth effortlessly, minimize downtime, and maintain high performance.
  • Implement effective alerting mechanisms to prioritize and address critical issues proactively.
  • Optimize and document infrastructure processes, leveraging tools like Terraform, Docker, and Airflow to create scalable and maintainable systems.
  • Partner with engineering teams to design and refine infrastructure that powers features like real-time monitoring, automated transaction execution, and analytics.

AWSDockerPostgreSQLKubernetesAirflowGrafanaRustCI/CDLinuxDevOpsTerraform

Posted 10 days ago
Apply