Apply

Site Reliability Engineer

Posted 2024-11-07

View full description

💎 Seniority level: Senior, Minimum of 3 years of experience

📍 Location: Slovakia

🔍 Industry: IGaming

🏢 Company: GoReel

🗣️ Languages: English

⏳ Experience: Minimum of 3 years of experience

🪄 Skills: AWSDockerProject ManagementElasticSearchGitJenkinsKibanaKubernetesJiraElasticsearchGrafanaPrometheusCollaborationCI/CDProblem Solving

Requirements:
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Minimum of 3 years of experience in a similar SRE role.
  • Strong proficiency in monitoring, logging, alerting, cloud, platform, OS, CI/CD, repo storage, and management tools.
  • Solid understanding of DevOps principles and practices.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.
Responsibilities:
  • Implement and maintain monitoring solutions using Prometheus, Victoria-Metrics, and Grafana.
  • Manage logging infrastructure using Fluentd, Fluent-bit, ElasticSearch, and Kibana.
  • Configure and manage alerting systems like AlertManager and Opsgenie.
  • Control utilization of AWS Cloud services, designing and managing infrastructure.
  • Deploy and manage containerized applications using Kubernetes, Docker, and Helm.
  • Implement and manage CI/CD pipelines using Jenkins and ArgoCD.
  • Manage code repositories using GitLab and Git.
  • Collaborate with cross-functional teams using Jira and Confluence.
Apply

Related Jobs

Apply

📍 Slovakia, Poland

🔍 IGaming

🏢 Company: GoReel

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Minimum of 3 years of experience in a similar SRE role.
  • Strong proficiency in monitoring, logging, alerting, cloud services, and management tools.
  • Solid understanding of DevOps principles and practices.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.

  • Implement and maintain monitoring solutions using Prometheus, Victoria-Metrics, and Grafana.
  • Manage logging infrastructure with Fluentd, Fluent-bit, ElasticSearch, and Kibana.
  • Configure alerting systems like AlertManager and Opsgenie.
  • Control utilization of AWS Cloud services and manage scalable infrastructure.
  • Deploy and manage containerized applications using Kubernetes, Docker, and Helm.
  • Implement CI/CD pipelines using Jenkins and ArgoCD.
  • Manage code repositories using GitLab and Git.
  • Collaborate with cross-functional teams using Jira and Confluence.

AWSDockerProject ManagementElasticSearchGitJenkinsKibanaKubernetesJiraElasticsearchGrafanaPrometheusCollaborationCI/CDDevOps

Posted 2024-11-26
Apply
Apply

📍 Germany and within Europe

🧭 Full-Time

🔍 Technology / Employee Communication

🏢 Company: Flip App

  • Experience in operating and scaling cloud infrastructures (Azure, AWS, GCP).
  • Deep knowledge of Kubernetes and container solutions.
  • Interest in observability tools such as Prometheus, VictoriaMetrics, Mimir, Loki, ELK.
  • Familiarity with SLO, error budget, and Apdex.
  • Good knowledge of software development languages like Go, Python, Kotlin.
  • Business fluent in English; German is a plus.
  • Experience with infrastructure as code tools (e.g., Pulumi, OpenTofu) and automation tools (e.g., Ansible, Chef).

  • Ensure the availability, performance, and scalability of the infrastructure.
  • Promote practices like CI/CD, observability, and developer experience.
  • Shape goals for scalable systems and observability.
  • Expand cloud infrastructure and Kubernetes cluster.
  • Ensure resilience and safety through zero-downtime rollouts.
  • Create observability through the further development of the LGTM stack.
  • Design, develop, and optimize infrastructure as code using Pulumi in Go.

AWSPythonSoftware DevelopmentGCPKotlinKubernetesAzureGoGrafanaPrometheusCI/CD

Posted 2024-11-07
Apply
Apply

📍 US, Europe

🧭 Full-Time

💸 175000 - 210000 USD per year

🔍 Cloud computing, AI

🏢 Company: CoreWeave💰 $642.0m Secondary Market on 2023-12-04Cloud ComputingMachine LearningInformation TechnologyCloud Infrastructure

  • You have 5+ years of experience in the software or infrastructure engineering industry.
  • Experience with Python, Go or another scripting language.
  • Experience with how to containerize applications and/or have experience using Kubernetes to manage deployments.
  • Experience with Git.
  • Experience with Linux shell scripting and/or can navigate a *nix-based operating system.
  • Experience creating and maintaining GitHub Actions to automate workflows.
  • You have experience deploying services in production and are interested in learning reliability-at-scale engineering concepts.
  • You have experience refining SDLC, doing code reviews, and providing technical support.

  • Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
  • Streamline repetitive tasks and eliminate bottlenecks to improve development velocity with automated workflows and processes.
  • Partner with developers to understand their pain points and develop tailored solutions that enhance their productivity.
  • Champion best practices and advocate for new tools and technologies to drive ongoing productivity gains.
  • Tackle complex issues related to build systems, testing frameworks, code analysis, and other developer tooling.
  • Enable and evangelize the practice of reliability engineering across CoreWeave's engineering teams.

PythonSoftware DevelopmentGitKubernetes*NixGoCollaboration

Posted 2024-11-07
Apply
Apply

📍 Europe

🧭 Full-Time

🔍 Technology

🏢 Company: Flip GmbH

  • Experience in operating and scaling cloud infrastructures (Azure, AWS, GCP).
  • Deep knowledge of Kubernetes and container solutions.
  • Interest in observability tools and concepts like SLO, error budget.
  • Good knowledge of software development (e.g., Go, Python, Kotlin).
  • Business fluent in English.

  • Help scale the cloud infrastructure and Kubernetes clusters.
  • Ensure zero downtime with effective rollout, redundancy, migration strategies, and rollback mechanisms.
  • Develop and optimize LGTM stack and analyze SLOs.
  • Enhance operational safety and resilience of systems.
  • Design, develop and optimize production, development, and cloud infrastructure with Pulumi in Go.
  • Increase development efficiency and optimize code deployments through effective tools and processes.
  • Improve the CI/CD pipeline for faster feedback cycles and secure rollouts.

AWSDockerPythonSoftware DevelopmentAgileGCPKubernetesSCRUMAzureCI/CD

Posted 2024-10-29
Apply
Apply

📍 EMEA, APAC, AMER

🔍 DevSecOps

🏢 Company: GitLab

  • Advanced datastore platform management experience, preferably using Postgres at scale.
  • Advanced Cloud Infrastructure management, preferably using GCP.
  • Advanced experience with Linux.
  • Solid experience with infrastructure and database automation using Terraform.
  • Experience with orchestration tools like Chef and/or Ansible.
  • Experience implementing monitoring at scale using Prometheus and Grafana.
  • Ability to promote GitLab's CREDIT values in work.
  • Superior verbal and written communication skills.
  • Comfortable working asynchronously across timezones.

  • Build: Automating operational tasks like package updates and configuration changes.
  • Maintain: Develop systems for reliable maintenance tasks like library upgrades.
  • Plan: Create monitoring systems to predict capacity needs.
  • Respond: Address user emergencies and support requests.
  • Enhance: Update security measures for GitLab's infrastructure.
  • Partner: Collaborate with internal teams on compliance assessments and improvements.
  • Collaborate: Work with software teams to resolve architectural issues.

PostgreSQLSoftware DevelopmentGCPGrafanaPostgresPrometheusCommunication SkillsCollaboration

Posted 2024-10-16
Apply
Apply

📍 France, EU/EEA

🏢 Company: Sinch👥 1001-5000💰 $48.8m Post-IPO Debt on 2024-09-13MessagingSaaSTelecommunicationsMobileSoftware

  • Background in infrastructure, operations, or software engineering.
  • Experience with cloud providers such as GCP.
  • Proficiency in configuration management tools such as Terraform and Ansible.
  • Hands-on proficiency with modern monitoring tools like Prometheus and Grafana.
  • Experience with distributed data stores such as Cassandra, PostgreSQL, and ElasticSearch.
  • Experience with Python and Bash is beneficial.
  • Strong technical skills across various infrastructure technologies.
  • Proven ability to break down complex tasks into manageable ones.
  • Strong communication skills and a history of building solid relationships with peers and leadership.
  • Experience operating and maintaining production systems in a Linux and public cloud environment.
  • Demonstrated ability to mentor and guide team members.

  • Be a part of the team that builds and operates the infrastructure at the heart of every Sinch Mailjet service.
  • You’ll be instrumental for the day-to-day management of our global infrastructure.
  • This includes monitoring and tracking key performance indicators (KPIs), collaborating with engineers to ensure our products and services are appropriately resourced, automating processes, and planning for future growth and scalability.
  • Partner with product engineering teams to identify systems requirements.
  • Build and support our cloud-based microservices infrastructure.
  • Automate routine processes and remediation tasks.
  • Develop, monitor and track Service Level Objectives (SLOs) for the systems under management.
  • Proactively troubleshoot, resolve, and plan for issues that typically come from support staff, other engineering teams, and our automated monitoring system.
  • Ensure our datastores are healthy and operate at optimal performance levels.
  • Contribute to the growth and culture of our engineering team.

LeadershipPostgreSQLPythonBashElasticSearchGCPCassandraElasticsearchGrafanaPrometheusCommunication Skills

Posted 2024-09-20
Apply
Apply

📍 Slovakia

🔍 IGaming

🏢 Company: GoReel

  • Understanding of cloud infrastructure and container orchestration.
  • Familiarity with monitoring and logging tools.
  • Strong problem-solving skills and attention to detail.
  • Ability to work effectively in a team environment.
  • Excellent communication skills.

  • Monitor and maintain the health of systems, ensuring high availability and performance.
  • Respond to incidents and troubleshoot issues in a timely manner.
  • Collaborate with development and operations teams to implement improvements and optimize system performance.
  • Create and maintain documentation for incident response and system maintenance procedures.
  • Participate in on-call rotations to provide 24/7 support.

AWSPostgreSQLElasticSearchKafkaKubernetesCassandraElasticsearchGrafanaCommunication Skills

Posted 2024-09-20
Apply