Site Reliability Engineer

Posted 2024-11-07

View full description

💎 Seniority level: Senior, Minimum of 3 years of experience

📍 Location: Slovakia

🔍 Industry: IGaming

🏢 Company: GoReel

🗣️ Languages: English

⏳ Experience: Minimum of 3 years of experience

🪄 Skills: AWSDockerProject ManagementElasticSearchGitJenkinsKibanaKubernetesJiraElasticsearchGrafanaPrometheusCollaborationCI/CDProblem Solving

Requirements:

Bachelor's degree in Computer Science, Engineering, or a related field.
Minimum of 3 years of experience in a similar SRE role.
Strong proficiency in monitoring, logging, alerting, cloud, platform, OS, CI/CD, repo storage, and management tools.
Solid understanding of DevOps principles and practices.
Excellent problem-solving and troubleshooting skills.
Strong communication and collaboration skills.

Responsibilities:

Implement and maintain monitoring solutions using Prometheus, Victoria-Metrics, and Grafana.
Manage logging infrastructure using Fluentd, Fluent-bit, ElasticSearch, and Kibana.
Configure and manage alerting systems like AlertManager and Opsgenie.
Control utilization of AWS Cloud services, designing and managing infrastructure.
Deploy and manage containerized applications using Kubernetes, Docker, and Helm.
Implement and manage CI/CD pipelines using Jenkins and ArgoCD.
Manage code repositories using GitLab and Git.
Collaborate with cross-functional teams using Jira and Confluence.

Apply

Related Jobs

Apply

🔥 Site Reliability Engineer

Posted 2024-11-26

📍 Slovakia, Poland

🔍 IGaming

🏢 Company: GoReel

Bachelor's degree in Computer Science, Engineering, or a related field.
Minimum of 3 years of experience in a similar SRE role.
Strong proficiency in monitoring, logging, alerting, cloud services, and management tools.
Solid understanding of DevOps principles and practices.
Excellent problem-solving and troubleshooting skills.
Strong communication and collaboration skills.

Implement and maintain monitoring solutions using Prometheus, Victoria-Metrics, and Grafana.
Manage logging infrastructure with Fluentd, Fluent-bit, ElasticSearch, and Kibana.
Configure alerting systems like AlertManager and Opsgenie.
Control utilization of AWS Cloud services and manage scalable infrastructure.
Deploy and manage containerized applications using Kubernetes, Docker, and Helm.
Implement CI/CD pipelines using Jenkins and ArgoCD.
Manage code repositories using GitLab and Git.
Collaborate with cross-functional teams using Jira and Confluence.

AWSDockerProject ManagementElasticSearchGitJenkinsKibanaKubernetesJiraElasticsearchGrafanaPrometheusCollaborationCI/CDDevOps

Posted 2024-11-26

Apply

🔥 Site Reliability Engineer (SRE) (m/w/d)

Posted 2024-11-07

📍 Germany and within Europe

🧭 Full-Time

🔍 Technology / Employee Communication

🏢 Company: Flip App

Experience in operating and scaling cloud infrastructures (Azure, AWS, GCP).
Deep knowledge of Kubernetes and container solutions.
Interest in observability tools such as Prometheus, VictoriaMetrics, Mimir, Loki, ELK.
Familiarity with SLO, error budget, and Apdex.
Good knowledge of software development languages like Go, Python, Kotlin.
Business fluent in English; German is a plus.
Experience with infrastructure as code tools (e.g., Pulumi, OpenTofu) and automation tools (e.g., Ansible, Chef).

Ensure the availability, performance, and scalability of the infrastructure.
Promote practices like CI/CD, observability, and developer experience.
Shape goals for scalable systems and observability.
Expand cloud infrastructure and Kubernetes cluster.
Ensure resilience and safety through zero-downtime rollouts.
Create observability through the further development of the LGTM stack.
Design, develop, and optimize infrastructure as code using Pulumi in Go.

AWSPythonSoftware DevelopmentGCPKotlinKubernetesAzureGoGrafanaPrometheusCI/CD

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer, Developer Productivity

Posted 2024-11-07

📍 US, Europe

🧭 Full-Time

💸 175000 - 210000 USD per year

🔍 Cloud computing, AI

🏢 Company: CoreWeave💰 $642.0m Secondary Market on 2023-12-04Cloud Computing Machine Learning Information Technology Cloud Infrastructure

You have 5+ years of experience in the software or infrastructure engineering industry.
Experience with Python, Go or another scripting language.
Experience with how to containerize applications and/or have experience using Kubernetes to manage deployments.
Experience with Git.
Experience with Linux shell scripting and/or can navigate a *nix-based operating system.
Experience creating and maintaining GitHub Actions to automate workflows.
You have experience deploying services in production and are interested in learning reliability-at-scale engineering concepts.
You have experience refining SDLC, doing code reviews, and providing technical support.

Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
Streamline repetitive tasks and eliminate bottlenecks to improve development velocity with automated workflows and processes.
Partner with developers to understand their pain points and develop tailored solutions that enhance their productivity.
Champion best practices and advocate for new tools and technologies to drive ongoing productivity gains.
Tackle complex issues related to build systems, testing frameworks, code analysis, and other developer tooling.
Enable and evangelize the practice of reliability engineering across CoreWeave's engineering teams.

PythonSoftware DevelopmentGitKubernetes*NixGoCollaboration

Posted 2024-11-07

Apply

🔥 Site Reliability Engineer (SRE)

Posted 2024-10-29

📍 Europe

🧭 Full-Time

🔍 Technology

🏢 Company: Flip GmbH

Experience in operating and scaling cloud infrastructures (Azure, AWS, GCP).
Deep knowledge of Kubernetes and container solutions.
Interest in observability tools and concepts like SLO, error budget.
Good knowledge of software development (e.g., Go, Python, Kotlin).
Business fluent in English.

Help scale the cloud infrastructure and Kubernetes clusters.
Ensure zero downtime with effective rollout, redundancy, migration strategies, and rollback mechanisms.
Develop and optimize LGTM stack and analyze SLOs.
Enhance operational safety and resilience of systems.
Design, develop and optimize production, development, and cloud infrastructure with Pulumi in Go.
Increase development efficiency and optimize code deployments through effective tools and processes.
Improve the CI/CD pipeline for faster feedback cycles and secure rollouts.

AWSDockerPythonSoftware DevelopmentAgileGCPKubernetesSCRUMAzureCI/CD

Posted 2024-10-29

Apply

🔥 Intermediate Site Reliability Engineer, Database Operations

Posted 2024-10-16

📍 EMEA, APAC, AMER

🔍 DevSecOps

🏢 Company: GitLab

Advanced datastore platform management experience, preferably using Postgres at scale.
Advanced Cloud Infrastructure management, preferably using GCP.
Advanced experience with Linux.
Solid experience with infrastructure and database automation using Terraform.
Experience with orchestration tools like Chef and/or Ansible.
Experience implementing monitoring at scale using Prometheus and Grafana.
Ability to promote GitLab's CREDIT values in work.
Superior verbal and written communication skills.
Comfortable working asynchronously across timezones.

Build: Automating operational tasks like package updates and configuration changes.
Maintain: Develop systems for reliable maintenance tasks like library upgrades.
Plan: Create monitoring systems to predict capacity needs.
Respond: Address user emergencies and support requests.
Enhance: Update security measures for GitLab's infrastructure.
Partner: Collaborate with internal teams on compliance assessments and improvements.
Collaborate: Work with software teams to resolve architectural issues.

PostgreSQLSoftware DevelopmentGCPGrafanaPostgresPrometheusCommunication SkillsCollaboration

Posted 2024-10-16

Apply

🔥 Site Reliability Engineer (Expert-level)

Posted 2024-09-20

📍 France, EU/EEA

🏢 Company: Sinch👥 1001-5000💰 $48.8m Post-IPO Debt on 2024-09-13Messaging SaaS Telecommunications Mobile Software

Background in infrastructure, operations, or software engineering.
Experience with cloud providers such as GCP.
Proficiency in configuration management tools such as Terraform and Ansible.
Hands-on proficiency with modern monitoring tools like Prometheus and Grafana.
Experience with distributed data stores such as Cassandra, PostgreSQL, and ElasticSearch.
Experience with Python and Bash is beneficial.
Strong technical skills across various infrastructure technologies.
Proven ability to break down complex tasks into manageable ones.
Strong communication skills and a history of building solid relationships with peers and leadership.
Experience operating and maintaining production systems in a Linux and public cloud environment.
Demonstrated ability to mentor and guide team members.

Be a part of the team that builds and operates the infrastructure at the heart of every Sinch Mailjet service.
You’ll be instrumental for the day-to-day management of our global infrastructure.
This includes monitoring and tracking key performance indicators (KPIs), collaborating with engineers to ensure our products and services are appropriately resourced, automating processes, and planning for future growth and scalability.
Partner with product engineering teams to identify systems requirements.
Build and support our cloud-based microservices infrastructure.
Automate routine processes and remediation tasks.
Develop, monitor and track Service Level Objectives (SLOs) for the systems under management.
Proactively troubleshoot, resolve, and plan for issues that typically come from support staff, other engineering teams, and our automated monitoring system.
Ensure our datastores are healthy and operate at optimal performance levels.
Contribute to the growth and culture of our engineering team.

LeadershipPostgreSQLPythonBashElasticSearchGCPCassandraElasticsearchGrafanaPrometheusCommunication Skills

Posted 2024-09-20

Apply

🔥 OnCall Site Reliability Engineer

Posted 2024-09-20

📍 Slovakia

🔍 IGaming

🏢 Company: GoReel

Understanding of cloud infrastructure and container orchestration.
Familiarity with monitoring and logging tools.
Strong problem-solving skills and attention to detail.
Ability to work effectively in a team environment.
Excellent communication skills.

Monitor and maintain the health of systems, ensuring high availability and performance.
Respond to incidents and troubleshoot issues in a timely manner.
Collaborate with development and operations teams to implement improvements and optimize system performance.
Create and maintain documentation for incident response and system maintenance procedures.
Participate in on-call rotations to provide 24/7 support.

AWSPostgreSQLElasticSearchKafkaKubernetesCassandraElasticsearchGrafanaCommunication Skills

Posted 2024-09-20

Apply

Site Reliability Engineer

Requirements:

Responsibilities:

Related Jobs

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities