Senior DevOps Engineer

ISHIRDigital Innovation, Enterprise AI Services

India / United States, EST OverlapFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 8+ years
Required Skills: AWSDockerPythonBashJavaKubernetesJiraGoPrometheusLinuxTerraformServiceNowHelm

8+ years in a platform/SRE/DevOps or infrastructure role, with a strong bias toward automation and support.
Experience operating Kubernetes (or similar) and core ecosystem tools (Helm, Docker, Ingress NGINX, Argo Rollouts basics).
Hands-on CI/CD experience (preferably GitLab CI): writing/modifying jobs, artifacts, environments, and basic deployment strategies.
Scripting ability in Bash or Python (Go a plus) to automate repetitive tasks and improve runbooks.
Familiarity with AWS fundamentals (e.g., IAM, EC2/EKS, S3, CloudWatch/CloudTrail, Parameter Store/Secrets Manager).
Practical understanding of monitoring/observability (dashboards, logs, alerts) and how to use them for triage and remediation, including Prometheus/Alertmanager/Thanos and OpenTelemetry basics.
Comfortable working from tickets (Jira/ServiceNow), following change-management practices, and communicating clearly with stakeholders.
Terraform experience for infrastructure as code (highly preferred).
API integration experience (Java, Python, or Go) to build small internal tools or glue code (highly preferred).
Deeper Linux fundamentals and container runtime basics for effective debugging and performance tuning (highly preferred).
Exposure to insurance/financial services environments, including awareness of compliance and operational controls (highly preferred).

Operate and improve platform tools for reliable product shipping, triaging tickets, fixing build issues, and handling routine service requests.
Maintain and extend self-service workflows (templates, golden paths) by updating docs, examples, and guardrails.
Perform day-to-day Kubernetes operations: deploy/update Helm charts, manage namespaces, diagnose rollout issues, and follow runbooks for incident response.
Support CI/CD pipelines (e.g., GitLab CI): keep pipelines green, add/adjust jobs, implement basic quality gates, and help teams adopt safer deploy strategies.
Monitor and operate the observability stack using Prometheus, Alert manager, and Thanos; maintain alert rules, dashboards, and SLO/SLA indicators; help reduce alert noise and improve signal quality.
Assist with service instrumentation across tracing, logging, and metrics with OpenTelemetry usage and related telemetry tooling.
Contribute to and improve documentation: runbooks, FAQs, onboarding guides, and standard operating procedures.
Participate in an on-call rotation as needed with a well-defined escalation path; assist during incidents, post small fixes, and capture learnings in docs.
Help with cost- and performance-minded housekeeping: right-size workloads, prune unused resources, and automate routine tasks where appropriate.

View Full Description & ApplyYou'll be redirected to the employer's site