Senior Site Reliability Engineer

Posted about 2 months agoViewed

View full description

💎 Seniority level: Senior, 5 years

📍 Location: Poland, Germany, United Kingdom

🔍 Industry: Artificial Intelligence and Data Science

🏢 Company: Mozn

🗣️ Languages: English

⏳ Experience: 5 years

🪄 Skills: AWSDockerPythonSQLBashHadoopKafkaKubernetesSparkCI/CDTerraformAnsible

Requirements:

BSc/BA in Computer Engineering, Computer Science, or related discipline.
5 years of experience in a similar position (SRE, DevOps, or infrastructure engineering).
Professional certifications are appreciated.
Solid experience with container runtimes and orchestrators: Docker and Kubernetes.
Experience with at least one major cloud provider: AWS, Azure, GCP, or Oracle.
Preferred programming languages for infrastructure as code: Python and Golang.
Experience with Linux servers and competency in bash scripting.
Experience with Infrastructure as Code.
Experience with automating deployment pipelines.
Solid foundation in networking.
Knowledge of big data platforms like Kafka, Hadoop, and Spark is a plus.
Knowledge of SQL and SQL database management is a plus.
Knowledge of Terraform or Ansible is a plus.

Responsibilities:

Mixture of software engineering, system architecture design, and operation.
Attend morning meetings and sprint planning as an SRE team member.
Help design, build, support, and scale cloud and on-premise infrastructure.
Implement monitoring, alerting, and debugging for infrastructure.
Design and implement CI/CD workflows with best practices.
Maintain data stores including load monitoring and backup plans.
Collaborate with other departments to address their use cases.
Explore new technologies to improve the current stack.
Install and configure servers and network equipment using Infrastructure as Code.
Practice sustainable incident response and blameless postmortems.

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer

Posted about 22 hours ago

📍 United Kingdom

🔍 Software Development

🏢 Company: StarRez👥 251-500💰 Private about 3 years agoConsulting SaaS Property Management Software

🔧 Requirements

1+ years experience working on a SaaS platform
Proven experience (2+ Years) in a Platform Engineering, Site Reliability Engineering or Software Engineering role.
Proficiency in at least one (or more) object-oriented programming language (C# preferable)
Production experience operating containerization technologies (Kubernetes).
Proficiency with one or more public cloud providers such as Azure, AWS or GCP
Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation.
Proficiency in scripting and automation using languages like Bash, PowerShell or Python.
Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar.
Proven track record of maintaining highly-available and performant production environments.
Ability to identify and implement effective mitigation strategies and operational playbooks.

💡 Responsibilities

Provide technical leadership and mentoring within the team through knowledge sharing sessions, pair programming, code reviews and solution design
Identify and implement solutions to improve platform reliability, including the creation of mitigation strategies and operational playbooks.
Implement and maintain monitoring/alerting/logging systems to identify and respond to incidents
Conduct/participate in Root Cause Analyses (RCAs) and blameless post-mortems
Participate in on-call rotations to ensure system reliability and rapid incident response.
Ensure scalability and efficiency of cloud infrastructure and systems to handle traffic and data growth
Conduct performance tests to identify and remediate bottlenecks
Develop and maintain platform solutions, automate infrastructure provisioning, configuration, and management tasks using Infrastructure as Code.
Monitor, review and tune databases to ensure high availability and performance
Collaborate with product engineering teams to design/build fit-for-purpose and observable software
Contribute and collaborate across teams to define Service Level Indicators (SLIs), Service Level Objectives (SLOs) and Service Level Agreements (SLAs) as required

AWSDockerPythonSQLBashGCPKubernetesC#AzureGrafanaPrometheusCI/CDDevOpsTerraformAnsibleSoftware EngineeringSaaS

Posted about 22 hours ago

Apply

🔥 Senior Site Reliability Engineer (SRE) - Poland

Posted 5 days ago

📍 Poland

🔍 Software Development

🔧 Requirements

Extensive experience with enterprise scale continuous delivery environments
Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment
Experience with sustainable incident response in a blameless environment
Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible
Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies
Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
Background in Linux Systems Engineering
Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.

💡 Responsibilities

Engage with teams and improve service delivery and reliability across their entire lifecycle
Measure and monitor all production systems with an eye towards availability, latency and overall system health
Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence
Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability
Help Identify and drive down toil with creative innovation and automation
On-call responsibilities

AWSDockerNode.jsPythonBashCloud ComputingGitJavascriptKibanaKubernetesTypeScriptAlgorithmsData StructuresGrafanaPrometheusCI/CDAgile methodologiesRESTful APIsLinuxDevOpsTerraformMicroservicesJSONAnsibleScriptingSoftware EngineeringDebugging

Posted 5 days ago

Apply

🔥 Senior Site Reliability Engineer - Linux

Posted 11 days ago

📍 United Kingdom, Canada

🔍 Software Development

🏢 Company: GoDaddy👥 5001-10000💰 $800,000,000 Post-IPO Equity about 3 years ago🫂 Last layoff over 1 year agoWeb Hosting Domain Registrar Web Development Online Portals

🔧 Requirements

A track record of delivering capabilities that build customer value and business impact.
Knowledge of principles for building performant and quality REST APIs.
Experience with testing code, care of and feeding of both on-premises as well as cloud compute systems, Docker and other container-related technologies, Python or similar languages, Hashicorp Vault or other similar tooling.

💡 Responsibilities

Engage with engineers and partners across the organization to solve problems with broad impact, stay ahead of the curve with new technologies, and advocate for modern and effective tech stacks.
Lead by example with a high standard for coding practices, including practical coding standards, modern software development approaches, test automation, and a strong focus on security.
Improve the observability of our production services, allowing the team to quickly highlight gaps, resolve issues, and understand the performance of our systems.
Share your expertise by training and guiding other engineers, encouraging a collaborative and nurturing environment for learning.

Backend DevelopmentDockerPythonCloud ComputingKubernetesAmazon Web ServicesREST APICI/CDLinuxAnsible

Posted 11 days ago

Apply

🔥 Senior Site Reliability Engineer

Posted 20 days ago

📍 United States, European timezones

🧭 Full-Time

🔍 Software Development

🏢 Company: Invert👥 11-50💰 $20,149,993 Seed 8 months agoData Management SaaS Application Performance Management

🔧 Requirements

NOT STATED

💡 Responsibilities

Design, build, and maintain scalable and secure cloud infrastructure as code
Develop and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure software reliability
Enable cost transparency and optimize infrastructure spending
Reduce cognitive load for product engineers by creating streamlined, efficient development workflows
Build and maintain robust CI/CD pipelines that accelerate time from code to customer
Create and maintain intuitive, comprehensive observability solutions for end-to-end system monitoring
Lead and continuously improve our Incident Management process
Participate in the on-call rotation, serving as a First Responder to quickly address and resolve system issues
Develop and maintain incident response playbooks and post-mortem practices

AWSDockerCI/CDLinuxTerraform

Posted 20 days ago

Apply

🔥 Senior Site Reliability Engineer

Posted 20 days ago

📍 Europe

🧭 Full-Time

🔍 Software Development

🏢 Company: Sanity👥 51-200💰 Corporate over 2 years agoSoftware Development

🔧 Requirements

Proven experience with SRE/DevOps tools, processes, and culture.
Proficient in programming languages like Python, Go, and TypeScript.
5+ years of experience participating in an SRE on-call rotation.
Analytical mindset for designing, diagnosing, and optimizing infrastructure.
Skilled in managing scalable, highly available, cloud-based applications.
Hands-on experience with Kubernetes for orchestrating, scaling, and managing containerized applications in the cloud.
Strong database management skills, particularly with PostgreSQL.
Experience with infrastructure as code, using tools like Terraform.
Proficient in building and maintaining CI/CD pipelines.
Familiarity with observability tools like Prometheus and similar stacks.
Calm and clear-headed in incident and outage situations, with a thoughtful communication style for high-pressure environments.
Open-minded yet discerning when it comes to exploring new technologies.

💡 Responsibilities

Plan and implement a global platform for delivering our software as a service.
Diagnose and troubleshoot complex distributed systems.
Ensure observability and analyze the behavior of our stack.
Orchestration, deployment, monitoring, automation.
Participate in our on-call rotation.

PostgreSQLPythonCloud ComputingElasticSearchKubernetesTypeScriptGoPrometheusCI/CDLinuxDevOpsTerraformMicroservices

Posted 20 days ago

Apply

🔥 Senior Site Reliability Engineer [United Kingdom]

Posted about 2 months ago

📍 United Kingdom

🧭 Contract

🔍 SaaS

🔧 Requirements

NOT STATED

💡 Responsibilities

Partner with Engineering and Product Managers to learn, improve system availability, and sharpen our execution skills to provide an amazing experience for our customers.

AWSDockerPythonSQLCI/CDDevOpsMicroservices

Posted about 2 months ago

Apply

🔥 Senior Site Reliability Engineer

Posted about 2 months ago

📍 Worldwide

🧭 Contract

🔍 Software Development

🏢 Company: Teravision Technologies👥 251-500💰 over 13 years agoAndroid iOS Mobile Apps Information Technology Software

🔧 Requirements

Experience managing and maintaining Kubernetes (K8s) infrastructure, including updates, patching, and software configuration management.
Familiarity with CI/CD pipelines, particularly TeamCity, and integrating tools like SonarQube.
Hands-on experience with AWS services such as S3, Route 53, and others.
Strong understanding of backend systems and infrastructure management.
Proficiency in troubleshooting, debugging, and ensuring system reliability in production environments.
Prior experience in an on-call role.
Knowledge of monitoring and alerting tools to support on-call responsibilities.

💡 Responsibilities

NOT STATED

AWSKubernetesCI/CDTroubleshootingDebugging

Posted about 2 months ago

Apply