Apply

Site Reliability Engineer

Posted 2 days agoViewed

View full description

πŸ’Ž Seniority level: Senior, 5+ years

πŸ“ Location: France

πŸ’Έ Salary: 50000.0 - 75000.0 EUR per year

πŸ” Industry: Speech AI

🏒 Company: GladiaπŸ‘₯ 11-50Digital MarketingSEOE-CommerceBrand MarketingAppsInformation TechnologyWeb Design

πŸ—£οΈ Languages: French, English

⏳ Experience: 5+ years

πŸͺ„ Skills: DockerPostgreSQLPythonGitKubernetesGrafanaPrometheusCI/CDLinuxNetworkingAnsible

Requirements:
  • At least 5+ years of experience working on a rapidly growing product, with a strong focus on scalability and well-tested solutions
  • Strong experience with PromQL, OpenTelemetry, and self-hosted stacks
  • Proficiency with Kubernetes and containerization
  • Experience with CI/CD processes (GitHub, test-driven development, etc.)
  • Knowledge of at least one programming language (Python, Go, etc.)
  • Knowledge of databases (PostgreSQL, Patroni)
  • Experience with UNIX/Linux operating systems
  • Networking knowledge (DNS, OSI model, HTTP/HTTPS, SSL/TLS)
Responsibilities:
  • Create and maintain hybrid Kubernetes clusters
  • Implement and manage the observability stack (CNCF landscape)
  • Prepare deployments for production
  • Optimize infrastructure and tool scaling to keep costs low
  • Support developers in implementing observability
  • Document technical procedures and policies
Apply

Related Jobs

Apply

πŸ“ United States, European timezones

🧭 Full-Time

πŸ” Software Development

🏒 Company: InvertπŸ‘₯ 11-50πŸ’° $20,149,993 Seed 8 months agoData ManagementSaaSApplication Performance Management

  • Experience in cloud infrastructure management
  • Knowledge of CI/CD processes
  • Experience with incident management
  • Design, build, and maintain scalable and secure cloud infrastructure as code
  • Develop and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure software reliability
  • Enable cost transparency and optimize infrastructure spending
  • Reduce cognitive load for product engineers by creating streamlined, efficient development workflows
  • Build and maintain robust CI/CD pipelines that accelerate time from code to customer
  • Create and maintain intuitive, comprehensive observability solutions for end-to-end system monitoring
  • Lead and continuously improve our Incident Management process
  • Participate in the on-call rotation, serving as a First Responder to quickly address and resolve system issues
  • Develop and maintain incident response playbooks and post-mortem practices

AWSDockerCI/CDLinuxTerraform

Posted 17 days ago
Apply
Apply

πŸ“ Europe

🧭 Full-Time

πŸ” Software Development

🏒 Company: SanityπŸ‘₯ 51-200πŸ’° Corporate over 2 years agoSoftware Development

  • Proven experience with SRE/DevOps tools, processes, and culture.
  • Proficient in programming languages like Python, Go, and TypeScript.
  • 5+ years of experience participating in an SRE on-call rotation.
  • Analytical mindset for designing, diagnosing, and optimizing infrastructure.
  • Skilled in managing scalable, highly available, cloud-based applications.
  • Hands-on experience with Kubernetes for orchestrating, scaling, and managing containerized applications in the cloud.
  • Strong database management skills, particularly with PostgreSQL.
  • Experience with infrastructure as code, using tools like Terraform.
  • Proficient in building and maintaining CI/CD pipelines.
  • Familiarity with observability tools like Prometheus and similar stacks.
  • Calm and clear-headed in incident and outage situations, with a thoughtful communication style for high-pressure environments.
  • Open-minded yet discerning when it comes to exploring new technologies.
  • Plan and implement a global platform for delivering our software as a service.
  • Diagnose and troubleshoot complex distributed systems.
  • Ensure observability and analyze the behavior of our stack.
  • Orchestration, deployment, monitoring, automation.
  • Participate in our on-call rotation.

PostgreSQLPythonCloud ComputingElasticSearchKubernetesTypeScriptGoPrometheusCI/CDLinuxDevOpsTerraformMicroservices

Posted 18 days ago
Apply
Apply

πŸ“ Worldwide

🧭 Contract

πŸ” Software Development

🏒 Company: Teravision TechnologiesπŸ‘₯ 251-500πŸ’° over 13 years agoAndroidiOSMobile AppsInformation TechnologySoftware

  • Experience managing and maintaining Kubernetes (K8s) infrastructure, including updates, patching, and software configuration management.
  • Familiarity with CI/CD pipelines, particularly TeamCity, and integrating tools like SonarQube.
  • Hands-on experience with AWS services such as S3, Route 53, and others.
  • Strong understanding of backend systems and infrastructure management.
  • Proficiency in troubleshooting, debugging, and ensuring system reliability in production environments.
  • Prior experience in an on-call role.
  • Knowledge of monitoring and alerting tools to support on-call responsibilities.
NOT STATED

AWSKubernetesCI/CDTroubleshootingDebugging

Posted about 2 months ago
Apply
Apply

πŸ“ Europe, South Africa, Egypt, Latin America

🧭 Full-Time

πŸ” Online Gaming

  • 4+ years experience in SRE or DevOps
  • Veteran in AWS technologies
  • Experience deploying into new regions
  • Managed multiple Kubernetes clusters
  • Plan and securely deploy into new regions
  • Improve all aspects of AWS infrastructure
  • Monitor all releases for smooth operations
  • Manage multiple K8s clusters
  • Research and implement new technology

AWSDockerPythonKubernetesGrafanaPrometheus

Posted 4 months ago
Apply
Apply

πŸ“ France, EU/EEA

🏒 Company: SinchπŸ‘₯ 1001-5000πŸ’° $48,845,918 Post-IPO Debt 6 months agoMessagingSaaSTelecommunicationsMobileSoftware

  • Background in infrastructure, operations, or software engineering.
  • Experience with cloud providers such as GCP.
  • Proficiency in configuration management tools such as Terraform and Ansible.
  • Hands-on proficiency with modern monitoring tools like Prometheus and Grafana.
  • Experience with distributed data stores such as Cassandra, PostgreSQL, and ElasticSearch.
  • Experience with Python and Bash is beneficial.
  • Strong technical skills across various infrastructure technologies.
  • Proven ability to break down complex tasks into manageable ones.
  • Strong communication skills and a history of building solid relationships with peers and leadership.
  • Experience operating and maintaining production systems in a Linux and public cloud environment.
  • Demonstrated ability to mentor and guide team members.
  • Be a part of the team that builds and operates the infrastructure at the heart of every Sinch Mailjet service.
  • You’ll be instrumental for the day-to-day management of our global infrastructure.
  • This includes monitoring and tracking key performance indicators (KPIs), collaborating with engineers to ensure our products and services are appropriately resourced, automating processes, and planning for future growth and scalability.
  • Partner with product engineering teams to identify systems requirements.
  • Build and support our cloud-based microservices infrastructure.
  • Automate routine processes and remediation tasks.
  • Develop, monitor and track Service Level Objectives (SLOs) for the systems under management.
  • Proactively troubleshoot, resolve, and plan for issues that typically come from support staff, other engineering teams, and our automated monitoring system.
  • Ensure our datastores are healthy and operate at optimal performance levels.
  • Contribute to the growth and culture of our engineering team.

LeadershipPostgreSQLPythonBashElasticSearchGCPCassandraGrafanaPrometheusCommunication Skills

Posted 6 months ago
Apply