Senior Site Reliability Engineer - APAC

Posted about 2 months agoViewed

SingaporeFull-TimeAPI Management

Company:Tyk Technologies

Location:Singapore, UTC

Languages:English

Seniority level:Senior

Skills:

AWSLeadershipPythonSoftware DevelopmentAWS EKSKubernetesMongoDBGoGrafanaPrometheusRedisCI/CDLinuxDevOpsTerraform

Requirements:

Experience in an SRE role Strong knowledge of cloud technologies and SLA SLO SLI management Excellent communication and leadership skills Ability to analyze and improve operational processes and performance metrics Experience in software design, automation, and root cause analysis On-call support experience and customer-focused mindset Collaborative attitude with commercial and technical teams Launching and operating production Kubernetes clusters Designing and operating infrastructure on AWS and other providers Operating MongoDB (or other document database) clusters Operating Redis (or other key-value storage) clusters Administering Linux servers Operating Prometheus and Grafana Operating logging collection and analysis system Participating in the on-call rotation (4:00am - 16:00pm UTC)

Responsibilities:

Lead hands-on maintenance and optimization of global Cloud platform Shape SRE strategy and translate into actionable technical plans Identify reliability issues, drive root cause analysis, and implement solutions Lead performance tuning and fault finding Design and implement automation for operational tasks Develop proactive alerting, monitoring roadmap, and dashboards Participate in on-call rotation Conduct blame-free postmortems and maintain operational runbooks Drive multi-region and multi-cloud platform expansion Optimize infrastructure performance and cost efficiency Engage with commercial teams on growth plans Coordinate penetration testing Champion continuous improvement Model excellence in software design and knowledge sharing Plan and execute software upgrades