Apply

Senior Site Reliability Engineer

Posted 2024-11-07

View full description

πŸ’Ž Seniority level: Senior, Proven experience as a Senior Site Reliability Engineer or in a similar role

πŸ“ Location: United Kingdom

πŸ’Έ Salary: 65000 - 80000 GBP per year

πŸ” Industry: Online marketplace

🏒 Company: OnBuy

πŸ—£οΈ Languages: English

⏳ Experience: Proven experience as a Senior Site Reliability Engineer or in a similar role

πŸͺ„ Skills: AWSDockerPythonSoftware DevelopmentGCPJavaKubernetesAzureGoGrafanaPrometheusDevOpsTerraformDocumentationMicroservices

Requirements:
  • Proven experience as a Senior Site Reliability Engineer or in a similar role.
  • Strong proficiency in programming languages such as Python, Go, or Java.
  • Experience with cloud service providers (AWS, Azure, Google Cloud) and container orchestration tools (Kubernetes, Docker).
  • Solid understanding of networking, distributed systems, and microservices architecture.
  • Familiarity with monitoring and logging tools (New Relic, Prometheus, Grafana, ELK stack, GCP logging).
  • Excellent problem-solving skills and ability to work effectively in a team.
  • Strong communication and interpersonal skills for collaboration with cross-functional teams.
Responsibilities:
  • Design and implement scalable systems to ensure high availability and performance.
  • Develop automated solutions for monitoring, scaling, and system health management.
  • Collaborate with software development teams to identify and resolve reliability issues.
  • Create and maintain documentation related to system architecture, processes, and configurations.
  • Perform incident response and postmortem analysis to improve site reliability and performance.
  • Monitor system performance and make necessary adjustments to ensure optimal functionality.
  • Implement and manage infrastructure as code using tools like Terraform or Ansible.
Apply

Related Jobs

Apply

πŸ“ Germany, Sweden, United Kingdom, Spain, Poland, Austria

🧭 Full-Time

πŸ” Video Games

  • Experience in online operations support
  • Ability to work closely with production and architecture teams
  • Strong collaboration and communication skills

  • Serve as liaison between various development teams and the network operations team
  • Collaborate closely with the production team and system architect
  • Ensure that projects related to Hunt: Showdown are well planned, documented, and implemented
  • Handle operational and project duties.

LeadershipProject ManagementProject CoordinationCross-functional Team LeadershipOperations Management

Posted 2024-11-07
Apply
Apply

πŸ“ US, Europe

🧭 Full-Time

πŸ’Έ 175000 - 210000 USD per year

πŸ” Cloud computing, AI

🏒 Company: CoreWeave

  • You have 5+ years of experience in the software or infrastructure engineering industry.
  • Experience with Python, Go or another scripting language.
  • Experience with how to containerize applications and/or have experience using Kubernetes to manage deployments.
  • Experience with Git.
  • Experience with Linux shell scripting and/or can navigate a *nix-based operating system.
  • Experience creating and maintaining GitHub Actions to automate workflows.
  • You have experience deploying services in production and are interested in learning reliability-at-scale engineering concepts.
  • You have experience refining SDLC, doing code reviews, and providing technical support.

  • Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
  • Streamline repetitive tasks and eliminate bottlenecks to improve development velocity with automated workflows and processes.
  • Partner with developers to understand their pain points and develop tailored solutions that enhance their productivity.
  • Champion best practices and advocate for new tools and technologies to drive ongoing productivity gains.
  • Tackle complex issues related to build systems, testing frameworks, code analysis, and other developer tooling.
  • Enable and evangelize the practice of reliability engineering across CoreWeave's engineering teams.

PythonSoftware DevelopmentGitKubernetes*NixGoCollaboration

Posted 2024-11-07
Apply
Apply

πŸ“ Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

πŸ’Έ 109000 - 169000 USD per year

πŸ” Nonprofit, Technology

  • Proficient at automation/programming/scripting skills.
  • Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.) as well as modern observability infrastructure (Prometheus, Grafana, Logstash/Kibana, Icinga/Nagios, etc.).
  • Advanced knowledge of Linux and IO/data storage concepts, internals and troubleshooting.
  • Experience with managing remotely both bare-metal servers and virtualized environments.
  • 5+ years experience in an SRE/Operations/DevOps role as part of a team.
  • Experience with high traffic and highly available website architectures and operations.
  • Strong English language skills.
  • Ability to work independently in a fast paced environment, as an effective part of a globally distributed team, including ticket tracking systems and asynchronous communication tools.
  • B.Sc. or M.Sc. in Computer Science or equivalent work experience.

  • Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments.
  • Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution.
  • Improving observability (alerting, metrics, monitoring) of database infrastructure.
  • Multi-datacenter systems design, capacity and infrastructure planning.
  • Taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia's production infrastructure and participating in an on call rotation.

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedis

Posted 2024-08-28
Apply
Apply

πŸ“ Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

πŸ’Έ 109000 - 169000 USD per year

πŸ” Nonprofit, knowledge sharing

  • Proficient at automation/programming/scripting skills
  • Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.)
  • Advanced knowledge of Linux and IO/data storage concepts
  • Experience with managing remotely both bare-metal servers and virtualized environments
  • 5+ years experience in an SRE/Operations/DevOps role
  • Experience with high traffic and highly available website architectures
  • Strong English language skills
  • Ability to work independently in a fast paced environment

  • Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments
  • Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution
  • Improving observability of database infrastructure
  • Designing multi-datacenter systems, capacity planning, and infrastructure planning
  • Participating in incident response and on-call rotation for system outages or alerts

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedisLinux

Posted 2024-08-28
Apply
Apply

πŸ“ Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

πŸ’Έ 109047 - 169455 USD per year

πŸ” Nonprofit / Technology

  • At least two years experience in an SRE/Operations/DevOps role as part of a team.
  • Experience supporting high availability distributed production systems.
  • Experience with database administration and support.
  • Comfortable with configuration management and orchestration tools (e.g., Puppet, Ansible, Chef, SaltStack).
  • Knowledge of modern observability infrastructure (monitoring, metrics, and logging).
  • Proficient in shell and scripting languages such as Python, Go, Bash, Ruby.
  • Good understanding of Linux/Unix fundamentals and debugging skills.
  • Excellent written and verbal communication skills.
  • BS or MS degree in Computer Science or equivalent work experience.

  • The Deployment, configuration and maintenance of the distributed data systems that comprise our data and analytics platform.
  • Implement data quality monitoring that alerts the team of possible data issues.
  • Collaborate closely with the Fundraising team to integrate and use data from self-hosted and third-party sources.
  • Provide engineering support during high-traffic or critical campaigns.
  • Write and update internal documentation of systems and processes.
  • Ensure compliance with regulations like the Donor Privacy Policy, GDPR, and PCI DSS.
  • Create and manage users and permissions for data access control.
  • Advise on data input best practices and develop processes for data entry consistency.
  • Work closely with Fundraising Analytics to gather and prioritize data enhancement requests.

PythonBashRubyC (Programming language)Data engineeringGoCommunication SkillsCollaboration

Posted 2024-08-22
Apply