ApplyStaff Infrastructure Engineer (Site Reliability Engineer)
Posted 2 months agoViewed
View full description
Requirements:
- 7–10 years' experience in SRE, DevOps, or Software Engineering roles.
- Extensive Kubernetes expertise at scale, with strong containerization knowledge.
- Hands-on proficiency in Infrastructure as Code (IaC) tools such as Ansible, puppet and Terraform.
- Strong coding skills in an OOP language plus the ability to develop effective scripting solutions.
- Deep security knowledge and best practices across all aspects of infrastructure and services.
- Expert-level cloud experience with AWS and/or GCP.
- Advanced monitoring and alerting skills using tools like Prometheus, Grafana, or similar.
- Solid understanding of networking fundamentals.
- Robust Linux or Windows administration experience.
- Software delivery automation (CI/CD, SDLC) and familiarity with static/dynamic application security testing.
- Comprehensive knowledge of SRE principles (SLI, SLO, SLA, Toil, Uptime, Observability).
- Elasticsearch – experience with managing and scaling ES at scale is strongly encouraged in this role.
Responsibilities:
- Oversee the reliability, scalability, performance, and security of key production services across various technical disciplines—from initial design to final implementation.
- Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
- Provide expert mentorship and guidance on best practices to engineers throughout the organization.
- Contribute to our 24×7 on-call rotation, ensuring uninterrupted availability of critical services.
- Drive standardization and documentation efforts to promote efficiency, consistency, and knowledge sharing.
Apply