Infrastructure Operations Engineer
New
N
NscaleGPU Cloud AI
Join our thriving remote-first team. Geography is no barrier to impact or connection.Full-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonBashGitKubernetesLinuxTerraformNetworkingAnsible
Requirements
- Platform and DC fundamentals: servers, networks, storage, and virtualisation.
- Linux fundamentals: CLI, systemd, filesystems, permissions, and networking tools.
- Networking basics: IP addressing, subnets, VLANs, routing, DNS, and firewalls.
- Kubernetes exposure: nodes, pods, services, and logs.
- GPU awareness: basic diagnostics such as nvidia-smi.
- Observability foundations: dashboards and alerts.
- Scripting and automation: Bash or Python snippets.
- Git for version control.
- Cloud and virtualisation basics.
Responsibilities
- Join the Support duty rotation and handle day‑to‑day tickets and alerts.
- Manage and resolve tickets whilst keeping all parties informed.
- Follow established runbooks to resolve common issues.
- Learn platform fundamentals to support customers.
- Participate in monitoring, troubleshooting, and triage.
- Deliver assigned tasks and project work.
- Share knowledge via documentation and training materials.
- Take part in incident reviews.
- Identify areas for automation.
- Participate in on‑call or out‑of‑hours work.
View Full Description & ApplyYou'll be redirected to the employer's site