Apply

Principal Site Reliability Engineer KRWFH1541

Posted 2024-10-23

View full description

💎 Seniority level: Principal, 7-10 years

📍 Location: United States

🔍 Industry: Cybersecurity and Advanced Technology

🏢 Company: Global InfoTek, Inc.

🗣️ Languages: English

⏳ Experience: 7-10 years

🪄 Skills: PostgreSQLPythonSoftware DevelopmentAgileElasticSearchGitJavaKubernetesMongoDBMySQLAzureElasticsearchGoGrafanaPrometheus

Requirements:
  • Bachelor's degree in computer science, Mathematics, or equivalent technical degree or equivalent industry experience.
  • Three-plus years developing production software in modern languages (Java, Python, Go, NodeJS, etc.).
  • One-plus year developing containerized services on orchestration platforms (Kubernetes, Mesos, Swarm, etc.).
  • Three-plus years of experience with agile and lean software development philosophies.
  • One-plus year working with relational and/or non-relational databases (PostgreSQL, MySQL, MongoDB, Elasticsearch, etc.).
  • Two-plus years with version control systems (Git, Subversion, Mercurial, etc.).
  • Five-plus years building and maintaining Kubernetes clusters across hybrid-cloud infrastructure.
  • Eight-plus years in Operations, DevOps, or Site Reliability Engineering.
  • Five-plus years in configuration/package management using tools like Terraform and Helm.
  • Five-plus years experience with Cloud service monitoring (Prometheus, Grafana, FluentD, ElasticStack, etc.).
  • Proficient in Linux system administration.
  • Experience with GitLab CI pipelines.
  • Experience creating automation using APIs from Azure or Google Cloud.
Responsibilities:
  • Build and maintain infrastructure as code on large scale multi-site deployments.
  • Evaluate and assess new ways to scale platform capabilities.
  • Automate workflows to enable continuous delivery on hybrid infrastructure.
  • Troubleshoot issues until root causes are understood on high traffic production systems.
  • Participate in design and code review processes.
  • Coordinate infrastructure changes with product owners.
  • Identify bottlenecks and improve platform performance.
Apply