Site Reliability Engineer

Posted 2 days agoViewed

View full description

💎 Seniority level: Senior, 5+ years

📍 Location: France

💸 Salary: 50000.0 - 75000.0 EUR per year

🔍 Industry: Speech AI

🏢 Company: Gladia👥 11-50 Digital Marketing SEO E-Commerce Brand Marketing Apps Information Technology Web Design

🗣️ Languages: French, English

⏳ Experience: 5+ years

🪄 Skills: DockerPostgreSQLPythonGitKubernetesGrafanaPrometheusCI/CDLinuxNetworkingAnsible

Requirements:

At least 5+ years of experience working on a rapidly growing product, with a strong focus on scalability and well-tested solutions
Strong experience with PromQL, OpenTelemetry, and self-hosted stacks
Proficiency with Kubernetes and containerization
Experience with CI/CD processes (GitHub, test-driven development, etc.)
Knowledge of at least one programming language (Python, Go, etc.)
Knowledge of databases (PostgreSQL, Patroni)
Experience with UNIX/Linux operating systems
Networking knowledge (DNS, OSI model, HTTP/HTTPS, SSL/TLS)

Responsibilities:

Create and maintain hybrid Kubernetes clusters
Implement and manage the observability stack (CNCF landscape)
Prepare deployments for production
Optimize infrastructure and tool scaling to keep costs low
Support developers in implementing observability
Document technical procedures and policies

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer

Posted 17 days ago

📍 United States, European timezones

🧭 Full-Time

🔍 Software Development

🏢 Company: Invert👥 11-50💰 $20,149,993 Seed 8 months agoData Management SaaS Application Performance Management

🔧 Requirements

Experience in cloud infrastructure management
Knowledge of CI/CD processes
Experience with incident management

💡 Responsibilities

Design, build, and maintain scalable and secure cloud infrastructure as code
Develop and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure software reliability
Enable cost transparency and optimize infrastructure spending
Reduce cognitive load for product engineers by creating streamlined, efficient development workflows
Build and maintain robust CI/CD pipelines that accelerate time from code to customer
Create and maintain intuitive, comprehensive observability solutions for end-to-end system monitoring
Lead and continuously improve our Incident Management process
Participate in the on-call rotation, serving as a First Responder to quickly address and resolve system issues
Develop and maintain incident response playbooks and post-mortem practices

AWSDockerCI/CDLinuxTerraform

Posted 17 days ago

Apply

🔥 Senior Site Reliability Engineer

Posted 18 days ago

📍 Europe

🧭 Full-Time

🔍 Software Development

🏢 Company: Sanity👥 51-200💰 Corporate over 2 years agoSoftware Development

🔧 Requirements

Proven experience with SRE/DevOps tools, processes, and culture.
Proficient in programming languages like Python, Go, and TypeScript.
5+ years of experience participating in an SRE on-call rotation.
Analytical mindset for designing, diagnosing, and optimizing infrastructure.
Skilled in managing scalable, highly available, cloud-based applications.
Hands-on experience with Kubernetes for orchestrating, scaling, and managing containerized applications in the cloud.
Strong database management skills, particularly with PostgreSQL.
Experience with infrastructure as code, using tools like Terraform.
Proficient in building and maintaining CI/CD pipelines.
Familiarity with observability tools like Prometheus and similar stacks.
Calm and clear-headed in incident and outage situations, with a thoughtful communication style for high-pressure environments.
Open-minded yet discerning when it comes to exploring new technologies.

💡 Responsibilities

Plan and implement a global platform for delivering our software as a service.
Diagnose and troubleshoot complex distributed systems.
Ensure observability and analyze the behavior of our stack.
Orchestration, deployment, monitoring, automation.
Participate in our on-call rotation.

PostgreSQLPythonCloud ComputingElasticSearchKubernetesTypeScriptGoPrometheusCI/CDLinuxDevOpsTerraformMicroservices

Posted 18 days ago

Apply

🔥 Senior Site Reliability Engineer

Posted about 2 months ago

📍 Worldwide

🧭 Contract

🔍 Software Development

🏢 Company: Teravision Technologies👥 251-500💰 over 13 years agoAndroid iOS Mobile Apps Information Technology Software

🔧 Requirements

Experience managing and maintaining Kubernetes (K8s) infrastructure, including updates, patching, and software configuration management.
Familiarity with CI/CD pipelines, particularly TeamCity, and integrating tools like SonarQube.
Hands-on experience with AWS services such as S3, Route 53, and others.
Strong understanding of backend systems and infrastructure management.
Proficiency in troubleshooting, debugging, and ensuring system reliability in production environments.
Prior experience in an on-call role.
Knowledge of monitoring and alerting tools to support on-call responsibilities.

💡 Responsibilities

NOT STATED

AWSKubernetesCI/CDTroubleshootingDebugging

Posted about 2 months ago

Apply

🔥 Site Reliability Engineer

Posted 4 months ago

📍 Europe, South Africa, Egypt, Latin America

🧭 Full-Time

🔍 Online Gaming

🔧 Requirements

4+ years experience in SRE or DevOps
Veteran in AWS technologies
Experience deploying into new regions
Managed multiple Kubernetes clusters

💡 Responsibilities

Plan and securely deploy into new regions
Improve all aspects of AWS infrastructure
Monitor all releases for smooth operations
Manage multiple K8s clusters
Research and implement new technology

AWSDockerPythonKubernetesGrafanaPrometheus

Posted 4 months ago

Apply

🔥 Site Reliability Engineer (Expert-level)

Posted 6 months ago

📍 France, EU/EEA

🏢 Company: Sinch👥 1001-5000💰 $48,845,918 Post-IPO Debt 6 months agoMessaging SaaS Telecommunications Mobile Software

🔧 Requirements

Background in infrastructure, operations, or software engineering.
Experience with cloud providers such as GCP.
Proficiency in configuration management tools such as Terraform and Ansible.
Hands-on proficiency with modern monitoring tools like Prometheus and Grafana.
Experience with distributed data stores such as Cassandra, PostgreSQL, and ElasticSearch.
Experience with Python and Bash is beneficial.
Strong technical skills across various infrastructure technologies.
Proven ability to break down complex tasks into manageable ones.
Strong communication skills and a history of building solid relationships with peers and leadership.
Experience operating and maintaining production systems in a Linux and public cloud environment.
Demonstrated ability to mentor and guide team members.

💡 Responsibilities

Be a part of the team that builds and operates the infrastructure at the heart of every Sinch Mailjet service.
You’ll be instrumental for the day-to-day management of our global infrastructure.
This includes monitoring and tracking key performance indicators (KPIs), collaborating with engineers to ensure our products and services are appropriately resourced, automating processes, and planning for future growth and scalability.
Partner with product engineering teams to identify systems requirements.
Build and support our cloud-based microservices infrastructure.
Automate routine processes and remediation tasks.
Develop, monitor and track Service Level Objectives (SLOs) for the systems under management.
Proactively troubleshoot, resolve, and plan for issues that typically come from support staff, other engineering teams, and our automated monitoring system.
Ensure our datastores are healthy and operate at optimal performance levels.
Contribute to the growth and culture of our engineering team.

LeadershipPostgreSQLPythonBashElasticSearchGCPCassandraGrafanaPrometheusCommunication Skills

Posted 6 months ago

Apply