Site Reliability Engineer

Posted about 1 month agoViewed
APACContractSoftware Development
Company:Rocket.Chat
Location:APAC
Languages:English
Seniority level:Senior
Skills:
AWSPythonBashGCPKubernetesMongoDBAzureGoGrafanaPrometheusRedisCI/CDLinuxDevOpsTerraformAnsibleSoftware EngineeringNetworking
Requirements:
Strong background in software engineering with expertise in large-scale distributed systems. Expertise in Kubernetes, including operator development. Expertise in cloud platforms (AWS, GCP, Azure, OVH). Proficiency in Go, Python, or Bash for tooling and operator development. Deep, hands-on experience with monitoring, logging, and alerting systems (Prometheus, Grafana, Loki). Experience with Infrastructure as Code (IaC) tools (Terraform, Pulumi, Ansible). Experience with CI/CD pipelines (ArgoCD). Solid understanding of networking fundamentals (TCP/IP, DNS, routing). Solid understanding of security principles. Familiarity with database technologies (MongoDB, Redis).
Responsibilities:
Design, develop, and maintain Kubernetes Operators for managed hosting. Oversee reliability and performance of foundational infrastructure (Kubernetes clusters, ArgoCD, Traefik, monitoring stack). Define, monitor, and enforce SLOs for critical services. Develop and maintain automation frameworks for deployment and operational tasks. Respond to critical alerts, lead post-mortems, and improve runbooks. Collaborate with Engineering, Security, and QA to integrate reliability best practices. Conduct load testing, performance analysis, and chaos engineering.
About the Company
Rocket.Chat
101-250 employeesDeveloper Tools
View Company Profile
Similar Jobs:
Posted 10 days ago
IndiaFull-TimeE-commerce
Site Reliability Engineer
Posted about 2 months ago
IndiaFull-TimeSoftware Development
Senior Site Reliability Engineer
Company:Arcadia
Posted 9 months ago
JapanFull-TimeSoftware Development
SRE (Site Reliability Engineer)
Company:Tailor