Apply

Network Operations Support Engineer

Posted 22 days agoViewed

View full description

💎 Seniority level: Middle, 3+ years

📍 Location: India

🔍 Industry: Software Development

🏢 Company: Atlan👥 251-500💰 $105,000,000 Series C about 1 year agoBig DataInformation TechnologyData GovernanceSoftware

⏳ Experience: 3+ years

🪄 Skills: AWSPythonSQLCloud ComputingGCPKubernetesAzureGoCI/CDRESTful APIsLinuxTerraformNetworkingTroubleshootingScripting

Requirements:
  • 3+ years of experience working with Kubernetes in production environments. Deep understanding of cluster operations, networking, storage, and security within Kubernetes.
  • Strong knowledge of AWS, Azure, and GCP, including core services, networking concepts, and security best practices.
  • Proven experience implementing GitOps workflows with ArgoCD and managing infrastructure using Terraform.
  • Fluency in at least one programming language (Python, Go, Java) for automation, scripting, and tool development.
  • Familiarity with SRE practices like SLOs (Service Level Objectives), error budgeting, and blameless postmortems.
  • Excellent analytical and troubleshooting skills to identify and resolve issues in complex cloud environments.
  • Ability to communicate effectively with development, operations, and security teams to drive cross-functional initiatives.
  • Ability to work from 8.30 PM to 5.30 AM IST to provide coverage for US time zones.
Responsibilities:
  • Design, deploy, and operate Kubernetes clusters across AWS, Azure, and GCP.
  • Optimize cluster performance, ensure high availability, and implement robust security practices.
  • Build and maintain cloud-native infrastructure components (load balancers, networking, storage, etc.) to support applications running on Kubernetes.
  • Leverage Infrastructure as Code (IaC) with Terraform to automate and manage infrastructure provisioning and configuration.
  • Embrace GitOps principles using ArgoCD to automate deployments and configuration changes and ensure consistency between the desired and actual system state.
  • Establish comprehensive monitoring, logging, and alerting systems to gain insights into platform health and performance.
  • Troubleshoot incidents swiftly and apply SRE principles to improve reliability and resilience.
  • Develop automation scripts and tools (Python, Go, or other languages) to streamline workflows, eliminate manual tasks, and reduce operational overhead.
  • Partner closely with development teams to understand their needs, provide guidance on platform best practices, and enable smooth integration and deployment of their applications.
  • Implement and maintain stringent security measures for Kubernetes and cloud environments, ensuring compliance with industry standards and data protection regulations.
  • Analyze resource usage and implement optimization strategies to maximize performance while controlling cloud costs.
  • Participate in an on-call rotation, troubleshooting and resolving production issues promptly.
Apply