Apply

Site Reliability Engineer - Remote - Canada

Posted 6 days agoViewed

View full description

💎 Seniority level: Middle, 4+ years

📍 Location: Canada

🔍 Industry: Software Development

🏢 Company: Jobgether👥 11-50💰 $1,493,585 Seed almost 2 years agoInternet

🗣️ Languages: English

⏳ Experience: 4+ years

🪄 Skills: AWSAmazon RDSKafkaKubernetesGrafanaPrometheusRedisCI/CDProblem SolvingTerraform

Requirements:
  • 4+ years of experience in Site Reliability Engineering or a similar role with a strong focus on cloud infrastructure.
  • Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.
  • Deep knowledge of AWS cloud services and best practices for scalable and secure architectures.
  • Hands-on experience with Confluent Cloud and Kafka for distributed data streaming.
  • Strong experience with Redis for caching and RDS for data storage.
  • Proficiency with OpenSearch/ElasticSearch/ChaosSearch for search and analytics.
  • Advanced knowledge of monitoring tools like Prometheus, Grafana, Alert Manager, and OpsGenie.
  • Experience with LaunchDarkly for feature flag management.
  • Extensive experience managing Kubernetes clusters, including Helm for package management, ArgoCD for deployments, and Istio for service mesh configurations.
  • Familiarity with Kustomize for Kubernetes resource configuration.
  • Strong problem-solving skills and ability to troubleshoot complex systems in production environments.
  • Excellent communication and collaboration skills within agile teams.
Responsibilities:
  • Design, build, and maintain highly scalable cloud infrastructure using Terraform and Terragrunt for automated resource provisioning.
  • Manage and optimize AWS cloud environments, ensuring security, cost efficiency, and high availability.
  • Oversee data streaming platforms using Confluent Cloud and Kafka, ensuring reliable data pipelines.
  • Deploy and manage Redis instances for caching and real-time data processing.
  • Implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie.
  • Enable feature flag management and controlled rollouts with LaunchDarkly.
  • Manage Kubernetes clusters, utilizing Helm, ArgoCD, Istio, and Kustomize for continuous deployment and infrastructure-as-code practices.
  • Collaborate with development teams to integrate new services into the infrastructure seamlessly.
  • Troubleshoot complex system issues to maintain high availability and performance.
  • Continuously improve automation tools, processes, and methodologies to enhance system scalability.
Apply

Related Jobs

Apply

📍 Canada

🧭 Full-Time

🔍 Site Reliability Engineering

🏢 Company: Jobgether👥 11-50💰 $1,493,585 Seed almost 2 years agoInternet

  • 4+ years of experience in Site Reliability Engineering or similar role
  • Expertise in Infrastructure as Code with Terraform and Terragrunt
  • Deep knowledge of AWS cloud services
  • Experience with Confluent Cloud and Kafka for data streaming
  • Strong experience with Redis and RDS
  • Design, build, and maintain scalable cloud infrastructure using Terraform and Terragrunt
  • Manage AWS cloud environments for security and high availability
  • Oversee data streaming platforms with Confluent Cloud and Kafka
  • Maintain monitoring and alerting solutions using Prometheus and Grafana
  • Manage Kubernetes clusters with Helm, ArgoCD, and Istio

AWSElasticSearchKafkaKubernetesGrafanaPrometheusRedisCI/CDTerraform

Posted about 5 hours ago
Apply