ApplySRE Manager
Posted about 19 hours agoViewed
View full description
Requirements:
- 5+ years experience in Leading/Management roles
- 5+ years of experience in Software Development
- Strong understanding of SDLC, microservice architecture
- Observability - NewRelic, Elastic, Grafana, PagerDuty, OTEL
- Knowledge Kubernetes clusters in production setting, AWS, IaaC
- Knowledge of CI-CD tooling Jenkins, Gitlab, GitHub, ArgoCD or similar
- 5+ years relational databases (mysql, postgres) as a plus
Responsibilities:
- Understanding how an Altium Cloud Platform works
- Pioneer improvements in observability, including logging, monitoring, and application performance management (APM), ensuring system reliability and proactive issue detection.
- Lead incident response and management, ensuring rapid resolution, clear stakeholder communication, and post-incident analysis for continuous improvement.
- Plan and overview infrastructure upgrades, patching, and maintenance activities while consistently managing and meeting agreed SLA targets.
- Recruit, mentor, and develop a high-performing SRE team, fostering professional growth and a collaborative culture.
- Participate in system design consulting, platform management, and capacity planning
- Improve reliability, quality, and time-to-market of our software solutions, including software development
- Partner closely with engineering and development teams to enhance product stability, observability, and manageability through best practices in reliability engineering.
- Partner closely with DevOps/Operations, drive automation initiatives, promote Infrastructure as Code (IaC), and streamline deployment processes to improve operational efficiency and scalability.
Apply