Senior Site Reliability Engineer

Posted 2024-09-05

View full description

💎 Seniority level: Senior, 5+ years

🔍 Industry: Fintech

🏢 Company: Upgrade👥 1001-5000 Consulting

⏳ Experience: 5+ years

🪄 Skills: DevOpsTerraform

Requirements:

5+ years of production-level SRE/DevOps experience in a cloud-based environment.
In-depth knowledge and hands-on experience with AWS services.
Proficiency in programming/scripting languages such as PowerShell, Python, or Bash.
Experience with SQL Server databases and Windows Server environments.
Knowledge of Ansible (Chef/Puppet) or Terraform.
Strong understanding of systems, networks, troubleshooting techniques, and automating build pipeline.
Ability to operate in an agile, fast-paced, entrepreneurial start-up environment.
Experience providing SRE/DevOps support to development teams for debugging Java applications is a plus.

Responsibilities:

Build a resilient, secure, and efficient cloud-based platform.
Automate deployment, monitoring, management, and incident response.
Monitor and troubleshoot platform issues.
Build and scale technology infrastructure to meet increasing demand.
Manage cross-functional requirements and collaborate with various stakeholders.
Work with Development and QA to deploy new features and services.
Develop and improve operational practices and procedures.

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-21

📍 United States

🧭 Full-Time

🔍 Legal technology

🏢 Company: Ramp Talent

Curiosity, willingness to learn, and passion for continuous improvement.
Proficiency in all skills expected of SRE II's.
Bachelor's degree in computer science, information systems, related field; comparable certifications; or equivalent direct work experience.
A minimum of 8 years of experience in hands-on technical roles.
A minimum of 2 years of Site Reliability Engineering experience.
Experience building autonomous systems that manage software operational details without human intervention.

Developing autonomous systems that manage the details necessary to build, deploy, test, and operate all Filevine Inc. products.
Being the voice of Reliability on your team throughout the SDLC.
Collecting, monitoring, aggregating, dashboarding, and alerting on software and server events.
Improving the CI/CD pipeline.
Developing playbooks, tools, and scripts to streamline processes and shorten problem resolution time.
Identifying and fixing gaps in the availability of systems.
Improving and defending the security of software and systems.
Documenting and diagramming processes, procedures, and best practices.
Finding, learning, improving, or creating new tools that are reliable, usable, and helpful.
Mentoring, training, and reviewing more junior engineers.
Participating in an on-call rotation for 24/7 production reliability support.

LeadershipCI/CDMentoring

Posted 2024-11-21

Apply

🔥 Senior Site Reliability Engineer (Poland remote)

Posted 2024-11-21

📍 Poland

🔍 Software

Posted 2024-11-21

Apply

🔥 Senior Site Reliability Engineer (SRE) - Disaster Recovery Specialist (m/f/x)

Posted 2024-11-21

🧭 Full-Time

🔍 Software / SaaS

Degree in Computer Science, Information Technology, or a related field.
5+ years of hands-on experience in site reliability engineering, ideally with a focus on disaster recovery.
Experience in a cloud-based SaaS environment.
Strong expertise in designing and implementing disaster recovery solutions using industry-leading technologies and methodologies.
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform.
Experience with infrastructure as code (IaC) tools such as Terraform or CloudFormation.
Excellent communication skills with the ability to effectively collaborate with cross-functional teams and communicate technical concepts to non-technical stakeholders.

Design, implement, and maintain disaster recovery solutions for cloud-based SaaS environments.
Develop and document comprehensive disaster recovery plans, procedures, and runbooks.
Conduct drills and exercises to test and validate the effectiveness of these plans.
Collaborate with engineering, operations, and security teams to identify and mitigate potential risks to system availability and data integrity.
Monitor system performance and health metrics; proactively identify areas for improvement.
Implement preventive measures to enhance system reliability and resilience.
Participate in incident response and post-incident reviews; analyze root causes of failures.
Implement corrective actions to prevent recurrence.

Posted 2024-11-21

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-11-20

📍 USA

💸 170000 - 190000 USD per year

🔍 Email Security

🏢 Company: Valimail

5+ years experience building and maintaining highly available relational databases.
Work collaboratively with cross functional teams
Value team success over individual success
Put industry and engineering best practices into practice and promotes them to others
Passion for reliable, scalable, and performant datastores with strong sense of ownership
Experience building and supporting highly performant and highly reliable datastores
Deep experience working with Postgres
Expert in database fundamentals, SQL, PL/pgSQL, (or other)
Experience with NOSQL datastores and caching solutions
Working knowledge of AWS or Azure cloud providers
Experience with Infrastructure-as-Code tools, such as Terraform

Evangelizing standard methodologies for building and operating highly reliable data storage systems
Serving as the subject matter expert in datastore design and performance
Building and supporting Valimail’s mission-critical datastores
Conducting timely post mortems of production datastore incidents
Collaboratively designing systems with other engineers to meet reliability, scalability, and performance requirements
Providing assistance to teams working with datastores
Automating routine database tasks
Participating in on-call rotation and incident response.
Upgrade data storage systems as necessary

AWSSQLAzurePostgresNosqlTerraform

Posted 2024-11-20

Apply

🔥 Senior Site Reliability Engineer (SRE) - Disaster Recovery Specialist (m/f/x)

Posted 2024-11-20

🧭 Full-Time

🔍 Software Development

Degree in Computer Science, Information Technology, or a related field.
5+ years of hands-on experience in site reliability engineering, ideally with a focus on disaster recovery.
Strong expertise in designing and implementing disaster recovery solutions using leading technologies.
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform.
Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation.
Excellent communication skills for collaboration with cross-functional teams and non-technical stakeholders.

Design, implement, and maintain disaster recovery solutions for a cloud-based SaaS environment.
Develop and document comprehensive disaster recovery plans, procedures, and runbooks.
Conduct drills and exercises to validate the effectiveness of disaster recovery plans.
Collaborate with engineering, operations, and security teams to identify and mitigate risks.
Proactively monitor system performance and health metrics, implement preventive measures.
Participate in incident response and post-incident reviews to analyze root causes and implement corrective actions.

Posted 2024-11-20

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-17

📍 Canada

🔍 Software Supply Chain Management

🏢 Company: FOSSA

Strong, demonstrated experience as a technical lead designing, building, and maintaining scalable infrastructure and tooling.
Strong knowledge of at least one cloud platform and maintaining managed services (we use AWS).
Strong experience implementing Infrastructure as Code using Terraform, Helm, and Kubernetes.
Experience building and maintaining build pipelines, deploying new services, and familiarity with CI/CD tools such as Buildkite, CircleCI, and GitHub Actions.
Experience with logging and monitoring tools such as Datadog, Statsd, Prometheus, Grafana.
Experience with packaging and deploying services using Docker on Linux.
Ability to break down complex problems, troubleshoot, drive towards a solution, and communicate it with the team and stakeholders.
Willingness to accept feedback and incorporate it into work.
Experience with source control tooling and processes, including branching, merging, and rebasing (we use git).
Willingness to take part in an on-call rotation.

Scale cloud infrastructure to meet increasing demand.
Assist development teams in deploying new services.
Ensure platform security and adherence to best practices.
Improve development tools, CI/CD pipelines, monitoring, and release processes.
Help teams use Helm and Kubernetes, and shape best practices.
Build access control and secret management solutions.
Maintain deployments for on-premise customers.

AWSDockerGitKubernetesGrafanaPrometheusCI/CDLinuxTerraform

Posted 2024-11-17

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-16

📍 Netherlands

🔍 Creative Technology

🏢 Company: Creative Fabrica

4+ years operating and supporting a high-volume, high-performance, cloud-native distributed computing environment.
Proven experience with Terraform, containers, and monitoring solutions.
Experience with a wide array of AWS-based services (EC2, ECS/EKS, S3, RDS, ALB, MSK, DynamoDB, Redshift, etc).
Experience supporting and deploying applications and microservices written in Go, Python, and PHP.
Experience with driving DevOps practices and developing automation solutions in a continuous deployment environment.
Experience with Kubernetes and Kafka is highly preferred.

Improve our site infrastructure to keep up with the company’s fast growth and technology evolution.
Proactively monitor the infrastructure and propose improvements.
Lead the design and building of a fully automated, developer self-service platform.
Research, develop and implement infrastructure management standards across our cloud accounts (AWS).
Participate in pre-production and production site releases.
Participate in the on-call rotation and in the debugging of issues.

AWSPHPPythonDynamoDBKafkaKubernetesTypeScriptGoDevOpsTerraformMicroservices

Posted 2024-11-16

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-16

📍 U.S.

🧭 Full-Time

💸 140000 - 160000 USD per year

🔍 Cybersecurity / Open source software

Sense of curiosity, resourcefulness, and pragmatism.
Expertise with multi-region deployments in public cloud environments.
Demonstrable production Kubernetes experience (Managed Kubernetes, Helm, kubectl, kOps, etc.).
Strong background in Reliability Engineering, DevOps, Software Engineering.
Fluency with at least one programming language, such as C#, Python, or Go.
Experience with cloud deployment and automation tools/methodologies (i.e. GitOps, Terraform, Pulumi).
Proficiency using source control such as Git.
Ability to maintain discretion and handle sensitive information.
Staying current with trends and new technologies.
Collaborative and adaptable mindset.
Excellent communication skills.
Strong problem-solving skills.

Take ownership of the Bitwarden cloud infrastructure, focusing on quality.
Evaluate infrastructure regularly, making recommendations for reliability, security, availability, scalability, and cost management.
Implement site reliability tools and observability systems.
Respond to outages and participate in a 24x7 support strategy.
Contribute to architectural designs and engineering operations at scale.
Engage in code reviews and spread technical knowledge.
Contribute to incident management processes.
Collaborate with teams to refine priorities and deliverables.
Align SLIs, SLOs, and SLAs with product owners.
Identify opportunities for new initiatives.
Influence the SDLC as Bitwarden scales.
Mentor team members.

PythonGitKubernetesC#StrategyGoCommunication SkillsDevOpsTerraform

Posted 2024-11-16

Apply

🔥 Senior Site Reliability Engineer (Remote First)

Posted 2024-11-15

📍 Canada

🧭 Full-Time

🔍 InsurTech

Extensive experience in infrastructure security, monitoring, release engineering, and developer tooling.
Ability to coach and mentor less experienced professionals.
Demonstrated leadership skills in guiding teams and improving capabilities.

Work with the Engineering Department to develop and provide infrastructure security, monitoring, release engineering, and developer tooling based on group-level and department-level requirements.
Provide guidance and leadership to DevOps chapter representatives from teams across the Engineering Department.
Suggest, plan, guide, and assist with the development and implementation of infrastructure to support goals.
Coach and mentor lower-level professionals.
Assist the Engineering Leadership Team in continuously improving craft capabilities.

LeadershipMentoringDevOpsCoaching

Posted 2024-11-15

Apply

🔥 Senior Site Reliability Engineer (SRE)

Posted 2024-11-12

🧭 Contract

Minimum of 5-7 years experience in Site Reliability Engineering or related fields.
Proven experience designing and implementing fault-tolerant, scalable systems.
Deep understanding of reliability methodologies like DFR, FMEA, and MTBF.
Proficiency with tools such as DataDog, PagerDuty, Marvin, Backstage.
Strong coding skills in one or more programming languages relevant to SRE.
Exceptional analytical skills for complex issue investigation.
Willingness to learn new products and tools.
Excellent communication skills for a distributed team environment.

Identify and resolve complex bugs within the codebase.
Enhance system reliability, scalability, and performance through code maintenance.
Restart services and implement necessary code changes.
Investigate complex system issues and develop resolutions.
Design and build fault-tolerant, scalable systems for high availability.
Apply methodologies like DFR, FMEA, and MTBF.
Develop and maintain reliability standards and documentation.

Posted 2024-11-12

Apply

Senior Site Reliability Engineer

Requirements:

Responsibilities:

Related Jobs

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities