ApplySenior Site Reliability Engineer I
Posted about 1 month agoViewed
View full description
Requirements:
- Minimum 5 years of experience with monitoring systems like Prometheus, NewRelic, AppDynamic, etc.
- Experience in developing and debugging in OOP languages such as Java, Python, Bash, or Go.
- Expert knowledge of Kubernetes.
- Experience with Cloud Infrastructure, preferably AWS.
- Experience in infrastructure automation (Infrastructure as Code).
- Experience in architecture/design, development, operation, and troubleshooting of highly available systems at scale.
- Experience building and owning tools for medium to large engineering teams.
- Experience in building systems, dashboards, and metrics for problem resolution.
- Strong Unix/Linux background with knowledge of the network stack and scripting.
- Focus on cost control while developing solutions.
Responsibilities:
- Development of a distributed monitoring system for functional, scalability, and reliability requirements.
- Design and architect scalable, testable, and maintainable solutions.
- Coach and mentor colleagues on the team.
- Facilitate collaboration with engineers, product owners, and designers to solve problems.
- Build and ship new features with a focus on code quality.
- Develop, maintain, and extend various systems, including open-source and in-house applications.
- Emphasize quality and high-quality code delivery.
Apply