Apply📍 United States
🧭 Full-Time
💸 145000.0 - 185000.0 USD per year
🔍 Software Development
- 7+ years of experience in Linux systems engineering roles supporting bare metal servers and virtualization/container platforms
- 3+ years’ Kubernetes administration experience on Red Hat OpenShift.
- Experience building and managing infrastructure in both public cloud and physical data center environments using IaC tools
- 5+ years’ experience with enterprise monitoring and logging solutions like Prometheus, ELK, or similar
- Proven ability to automate the right things in the simplest way possible (scripts, config management tools, CI pipelines, RHOS Operators, etc.)
- Solid understanding of networking fundamentals and storage technologies
- Competency in at least one high level programming language (i.e., Golang, Python, etc.)
- Experience supporting customer-facing SaaS products
- translate high level platform design into low level technical design and are responsible for implementing, administering, supporting, and patching their corresponding platforms.
- Installs, configures, and monitors applications and services in the OpenShift cluster.
- Continually assesses technical components to recommend platform improvements, translating high-level design and RHOS best practices into low-level technical configuration.
- Ensures the ongoing stability, availability, performance, and security compliance of the platform to meet customer SLAs; authors and executes test cases to validate
- Collaborates with software delivery teams and architects to build and support self-service mechanisms, CI/CD pipelines, and k8s operators that simplify and accelerate service delivery, in accordance with DevOps and Agile frameworks
- Maintains the catalog of services for the platform in collaboration with Engineering.
- Instruments and optimizes application, system, and cluster performance.
- Forecasts and plans capacity increases to ensure resource availability for engineering teams while meeting budget targets.
- Helps build and implement Disaster Recovery / Business Continuity plan; conducts related testing of recovery procedures.
- Helps determine Platform roadmap, manage projects and ticket-based work; ensures these are clearly communicated with stakeholders at all levels.
- Provides thought leadership on DevOps and Platform Engineering-centric system and process design, giving constructive input to engineers and leaders on proposals and best practices.
- Builds internal documentation and artifacts describing the mechanisms used for deployment, monitoring, and operators.
- Leads by showing: mentors and helps develop engineers in a highly demonstrative and collaborative way
- Participates in an on-call rotation with fellow team members
AWSPythonBashCloud ComputingJenkinsKafkaKubernetesGoPrometheusCI/CDLinuxDevOpsTerraformNetworkingAnsibleScriptingSaaS
Posted about 22 hours ago
Apply