Site Reliability Engineer
New
V
Veeam SoftwareData and AI
Remote, United StatesFull-TimeMiddle
Salary109,800 - 252,500 USD per year OTE
Apply NowOpens the employer's application page
Job Details
- Experience
- 3+ years in Software Engineering, with at least 1 year in SRE, Platform Engineering, or DevOps
- Required Skills
- KubernetesTypeScriptAzureGoGrafanaPrometheusCI/CDDevOpsTerraform
Requirements
- 3+ years in Software Engineering, with at least 1 year in SRE, Platform Engineering, or DevOps
- Experience with cloud infrastructure on Azure or comparable
- Familiarity with regulated or compliance-oriented environments (FedRAMP, CMMC, PCI-DSS, HIPAA)
- Ability to read and understand code to investigate system behavior
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack)
- Experience with IaC tools (Terraform, Terragrunt, or Pulumi)
- Experience with container orchestration (Kubernetes)
- Experience with CI/CD tooling (GitHub Actions, Azure DevOps, GitLab CI, or ArgoCD)
- Strong programming skills in TypeScript/JS, Go, Java, C#, or similar
- Solid understanding of distributed systems fundamentals and networking basics
Responsibilities
- Get up to speed on VDC workloads, dependencies, and operational workflows
- Write and maintain runbooks, incident guides, and operational documentation
- Participate in incident response including triage, investigation, mitigation, and postmortems
- Help implement and maintain SLIs, SLOs, and error budgets
- Close monitoring gaps by implementing instrumentation, alerting, and dashboards
- Contribute to toil reduction through automation and tooling improvements
- Work with IaC, CI/CD pipelines, and deployment tooling in compliance-restricted environments
- Work with engineering, security, compliance, and operations teams
View Full Description & ApplyYou'll be redirected to the employer's site