Site Reliability Engineer - (Remote - Canada)

Posted 1 day agoViewed

💎 Seniority level: Middle, 4+ years

📍 Location: Canada

🔍 Industry: Site Reliability Engineering

🏢 Company: Jobgether👥 11-50💰 $1,493,585 Seed almost 2 years agoInternet

🗣️ Languages: English

⏳ Experience: 4+ years

🪄 Skills: AWSElasticSearchKafkaKubernetesGrafanaPrometheusRedisCI/CDTerraform

Design, build, and maintain scalable cloud infrastructure using Terraform and Terragrunt
Manage AWS cloud environments for security and high availability
Oversee data streaming platforms with Confluent Cloud and Kafka
Maintain monitoring and alerting solutions using Prometheus and Grafana
Manage Kubernetes clusters with Helm, ArgoCD, and Istio

Posted 7 days ago

📍 Canada

🔍 Software Development

🏢 Company: Jobgether👥 11-50💰 $1,493,585 Seed almost 2 years agoInternet

🔧 Requirements

4+ years of experience in Site Reliability Engineering or a similar role with a strong focus on cloud infrastructure.
Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.
Deep knowledge of AWS cloud services and best practices for scalable and secure architectures.
Hands-on experience with Confluent Cloud and Kafka for distributed data streaming.
Strong experience with Redis for caching and RDS for data storage.
Proficiency with OpenSearch/ElasticSearch/ChaosSearch for search and analytics.
Advanced knowledge of monitoring tools like Prometheus, Grafana, Alert Manager, and OpsGenie.
Experience with LaunchDarkly for feature flag management.
Extensive experience managing Kubernetes clusters, including Helm for package management, ArgoCD for deployments, and Istio for service mesh configurations.
Familiarity with Kustomize for Kubernetes resource configuration.
Strong problem-solving skills and ability to troubleshoot complex systems in production environments.
Excellent communication and collaboration skills within agile teams.

💡 Responsibilities

Design, build, and maintain highly scalable cloud infrastructure using Terraform and Terragrunt for automated resource provisioning.
Manage and optimize AWS cloud environments, ensuring security, cost efficiency, and high availability.
Oversee data streaming platforms using Confluent Cloud and Kafka, ensuring reliable data pipelines.
Deploy and manage Redis instances for caching and real-time data processing.
Implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie.
Enable feature flag management and controlled rollouts with LaunchDarkly.
Manage Kubernetes clusters, utilizing Helm, ArgoCD, Istio, and Kustomize for continuous deployment and infrastructure-as-code practices.
Collaborate with development teams to integrate new services into the infrastructure seamlessly.
Troubleshoot complex system issues to maintain high availability and performance.
Continuously improve automation tools, processes, and methodologies to enhance system scalability.

AWSAmazon RDSKafkaKubernetesGrafanaPrometheusRedisCI/CDProblem SolvingTerraform

Posted 7 days ago