Senior Site Reliability Engineer

Posted 2024-11-07

View full description

💎 Seniority level: Senior, Proven experience as a Senior Site Reliability Engineer or in a similar role

📍 Location: United Kingdom

💸 Salary: 65000 - 80000 GBP per year

🔍 Industry: Online marketplace

🏢 Company: OnBuy

🗣️ Languages: English

⏳ Experience: Proven experience as a Senior Site Reliability Engineer or in a similar role

🪄 Skills: AWSDockerPythonSoftware DevelopmentGCPJavaKubernetesAzureGoGrafanaPrometheusDevOpsTerraformDocumentationMicroservices

Requirements:

Proven experience as a Senior Site Reliability Engineer or in a similar role.
Strong proficiency in programming languages such as Python, Go, or Java.
Experience with cloud service providers (AWS, Azure, Google Cloud) and container orchestration tools (Kubernetes, Docker).
Solid understanding of networking, distributed systems, and microservices architecture.
Familiarity with monitoring and logging tools (New Relic, Prometheus, Grafana, ELK stack, GCP logging).
Excellent problem-solving skills and ability to work effectively in a team.
Strong communication and interpersonal skills for collaboration with cross-functional teams.

Responsibilities:

Design and implement scalable systems to ensure high availability and performance.
Develop automated solutions for monitoring, scaling, and system health management.
Collaborate with software development teams to identify and resolve reliability issues.
Create and maintain documentation related to system architecture, processes, and configurations.
Perform incident response and postmortem analysis to improve site reliability and performance.
Monitor system performance and make necessary adjustments to ensure optimal functionality.
Implement and manage infrastructure as code using tools like Terraform or Ansible.

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-07

📍 Germany, Sweden, United Kingdom, Spain, Poland, Austria

🧭 Full-Time

🔍 Video Games

Experience in online operations support
Ability to work closely with production and architecture teams
Strong collaboration and communication skills

Serve as liaison between various development teams and the network operations team
Collaborate closely with the production team and system architect
Ensure that projects related to Hunt: Showdown are well planned, documented, and implemented
Handle operational and project duties.

LeadershipProject ManagementProject CoordinationCross-functional Team LeadershipOperations Management

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer, Developer Productivity

Posted 2024-11-07

📍 US, Europe

🧭 Full-Time

💸 175000 - 210000 USD per year

🔍 Cloud computing, AI

🏢 Company: CoreWeave

You have 5+ years of experience in the software or infrastructure engineering industry.
Experience with Python, Go or another scripting language.
Experience with how to containerize applications and/or have experience using Kubernetes to manage deployments.
Experience with Git.
Experience with Linux shell scripting and/or can navigate a *nix-based operating system.
Experience creating and maintaining GitHub Actions to automate workflows.
You have experience deploying services in production and are interested in learning reliability-at-scale engineering concepts.
You have experience refining SDLC, doing code reviews, and providing technical support.

Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
Streamline repetitive tasks and eliminate bottlenecks to improve development velocity with automated workflows and processes.
Partner with developers to understand their pain points and develop tailored solutions that enhance their productivity.
Champion best practices and advocate for new tools and technologies to drive ongoing productivity gains.
Tackle complex issues related to build systems, testing frameworks, code analysis, and other developer tooling.
Enable and evangelize the practice of reliability engineering across CoreWeave's engineering teams.

PythonSoftware DevelopmentGitKubernetes*NixGoCollaboration

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-08-28

📍 Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

💸 109000 - 169000 USD per year

🔍 Nonprofit, Technology

Proficient at automation/programming/scripting skills.
Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.) as well as modern observability infrastructure (Prometheus, Grafana, Logstash/Kibana, Icinga/Nagios, etc.).
Advanced knowledge of Linux and IO/data storage concepts, internals and troubleshooting.
Experience with managing remotely both bare-metal servers and virtualized environments.
5+ years experience in an SRE/Operations/DevOps role as part of a team.
Experience with high traffic and highly available website architectures and operations.
Strong English language skills.
Ability to work independently in a fast paced environment, as an effective part of a globally distributed team, including ticket tracking systems and asynchronous communication tools.
B.Sc. or M.Sc. in Computer Science or equivalent work experience.

Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments.
Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution.
Improving observability (alerting, metrics, monitoring) of database infrastructure.
Multi-datacenter systems design, capacity and infrastructure planning.
Taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia's production infrastructure and participating in an on call rotation.

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedis

Posted 2024-08-28

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-08-28

🧭 Full-Time

💸 109000 - 169000 USD per year

🔍 Nonprofit, knowledge sharing

Proficient at automation/programming/scripting skills
Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.)
Advanced knowledge of Linux and IO/data storage concepts
Experience with managing remotely both bare-metal servers and virtualized environments
5+ years experience in an SRE/Operations/DevOps role
Experience with high traffic and highly available website architectures
Strong English language skills
Ability to work independently in a fast paced environment

Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments
Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution
Improving observability of database infrastructure
Designing multi-datacenter systems, capacity planning, and infrastructure planning
Participating in incident response and on-call rotation for system outages or alerts

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedisLinux

Posted 2024-08-28

Apply

🔥 Senior Site Reliability Engineer, Data Engineering

Posted 2024-08-22

🧭 Full-Time

💸 109047 - 169455 USD per year

🔍 Nonprofit / Technology

At least two years experience in an SRE/Operations/DevOps role as part of a team.
Experience supporting high availability distributed production systems.
Experience with database administration and support.
Comfortable with configuration management and orchestration tools (e.g., Puppet, Ansible, Chef, SaltStack).
Knowledge of modern observability infrastructure (monitoring, metrics, and logging).
Proficient in shell and scripting languages such as Python, Go, Bash, Ruby.
Good understanding of Linux/Unix fundamentals and debugging skills.
Excellent written and verbal communication skills.
BS or MS degree in Computer Science or equivalent work experience.

The Deployment, configuration and maintenance of the distributed data systems that comprise our data and analytics platform.
Implement data quality monitoring that alerts the team of possible data issues.
Collaborate closely with the Fundraising team to integrate and use data from self-hosted and third-party sources.
Provide engineering support during high-traffic or critical campaigns.
Write and update internal documentation of systems and processes.
Ensure compliance with regulations like the Donor Privacy Policy, GDPR, and PCI DSS.
Create and manage users and permissions for data access control.
Advise on data input best practices and develop processes for data entry consistency.
Work closely with Fundraising Analytics to gather and prioritize data enhancement requests.

PythonBashRubyC (Programming language)Data engineeringGoCommunication SkillsCollaboration

Posted 2024-08-22

Apply

Senior Site Reliability Engineer

Requirements:

Responsibilities:

Related Jobs

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities