Senior Site Reliability Engineer, Observability

New
C
Chainlink LabsBlockchain, DeFi
United States; Secondary Locations: Brazil, Buenos Aires, Vancouver, Toronto, Colombia, Mexico, Try to overlap some working hours with Eastern Standard Time (EST).Full-TimeSenior
Salary129,000 - 304,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
AWSPythonKubernetesGoGrafanaPrometheusDevOpsTerraform

Requirements

  • 7+ years of relevant professional experience in devops, infrastructure, SRE, and/or platform teams
  • Ability to develop software outside of the scope of typical infrastructure requirements and configurations
  • Proficiency in programming in C, C++, Java, Python, Go, Perl, or Ruby
  • Expert knowledge in designing, developing, and managing large real-time systems
  • Experience with monitoring and logging (exporting metrics using Prometheus, building Grafana dashboards, centralized logging solutions like ELK Stack or Splunk)
  • Experience with distributed systems and container orchestration (Kubernetes)
  • Strong communication skills for giving/receiving feedback, planning meetings, and code reviews
  • Comfortable with AWS, Terraform/Terragrunt, Calico, ArgoCD, and GitHub Actions

Responsibilities

  • Build and orchestrate Modern OTEL-based Observability Platform
  • Support multiple telemetry types, like metrics, logs and traces
  • Define and support modern governance in observability and problems at scale
  • Ensure reliability, security, and performance exceed our defined SLAs
  • Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
  • Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action
  • Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline
  • Oversee the availability, performance, and supportability of our observability infrastructure
  • Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data
  • Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release
View Full Description & ApplyYou'll be redirected to the employer's site
129,000 - 304,000 USD per year
Apply Now