ApplySite Reliability Engineer (Expert-level)
Posted 4 months agoViewed
View full description
Requirements:
- Background in infrastructure, operations, or software engineering.
- Experience with cloud providers such as GCP.
- Proficiency in configuration management tools such as Terraform and Ansible.
- Hands-on proficiency with modern monitoring tools like Prometheus and Grafana.
- Experience with distributed data stores such as Cassandra, PostgreSQL, and ElasticSearch.
- Experience with Python and Bash is beneficial.
- Strong technical skills across various infrastructure technologies.
- Proven ability to break down complex tasks into manageable ones.
- Strong communication skills and a history of building solid relationships with peers and leadership.
- Experience operating and maintaining production systems in a Linux and public cloud environment.
- Demonstrated ability to mentor and guide team members.
Responsibilities:
- Be a part of the team that builds and operates the infrastructure at the heart of every Sinch Mailjet service.
- You’ll be instrumental for the day-to-day management of our global infrastructure.
- This includes monitoring and tracking key performance indicators (KPIs), collaborating with engineers to ensure our products and services are appropriately resourced, automating processes, and planning for future growth and scalability.
- Partner with product engineering teams to identify systems requirements.
- Build and support our cloud-based microservices infrastructure.
- Automate routine processes and remediation tasks.
- Develop, monitor and track Service Level Objectives (SLOs) for the systems under management.
- Proactively troubleshoot, resolve, and plan for issues that typically come from support staff, other engineering teams, and our automated monitoring system.
- Ensure our datastores are healthy and operate at optimal performance levels.
- Contribute to the growth and culture of our engineering team.
Apply