Senior Site Reliability Engineer - APAC

Posted about 1 month agoViewed
SingaporeFull-TimeAPI Management
Company:Tyk Technologies
Location:Singapore, UTC
Languages:English
Seniority level:Senior
Skills:
AWSLeadershipPythonSoftware DevelopmentAWS EKSKubernetesMongoDBGoGrafanaPrometheusRedisCI/CDLinuxDevOpsTerraform
Requirements:
Experience in an SRE role Strong knowledge of cloud technologies and SLA SLO SLI management Excellent communication and leadership skills Ability to analyze and improve operational processes and performance metrics Experience in software design, automation, and root cause analysis On-call support experience and customer-focused mindset Collaborative attitude with commercial and technical teams Launching and operating production Kubernetes clusters Designing and operating infrastructure on AWS and other providers Operating MongoDB (or other document database) clusters Operating Redis (or other key-value storage) clusters Administering Linux servers Operating Prometheus and Grafana Operating logging collection and analysis system Participating in the on-call rotation (4:00am - 16:00pm UTC) Kubernetes (administrator) Go and/or Python (advanced) AWS/ EKS (advanced) Linux (advanced) Terraform and IaC in general (proficient) Helm (proficient) MongoDB (or similar) Redis (or similar) Monitoring – prometheus, grafana, thanos (familiar) Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.) Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP) Proactive, energetic, innovative and change oriented A desire to lead/mentor a team
Responsibilities:
Lead hands-on maintenance and optimization of the global Cloud platform. Collaborate to shape SRE strategy and translate into actionable technical plans. Identify reliability issues, drive root cause analysis, and implement solutions. Lead performance tuning and fault finding through analysis of OS and application metrics. Design and implement automation for common operational tasks and cloud-operations workflows. Develop proactive alerting, monitoring roadmap, and relevant dashboards. Define and track KPIs. Participate in on-call rotation. Conduct blame-free postmortems and document findings. Maintain operational runbooks. Drive multi-region and multi-cloud platform expansion. Optimize infrastructure performance and cost efficiency. Engage with commercial teams on growth plans. Coordinate penetration testing. Champion continuous improvement across processes, communication, and team practices. Model excellence in software design and knowledge sharing. Plan and execute software upgrades to enhance cloud services.
Similar Jobs:
Posted 1 day ago
APACFull-TimeDatabase Support
Database Support Engineer (APAC)
Company:Supabase
Posted 3 days ago
SingaporeFull-TimeCommunications
Senior Manager, Strat Accounts - RoA
Company:Twilio
Posted 3 days ago
North America, AsiaFull-TimeSoftware Development
Senior Software Engineer (Full-Stack, Backend-leaning)
Company:Jerry.ai