Senior Site Reliability Engineer, Observability
New
C
Chainlink LabsBlockchain, DeFi
United States; Secondary Locations: Brazil, Buenos Aires, Vancouver, Toronto, Colombia, Mexico, Try to overlap some working hours with Eastern Standard Time (EST).Full-TimeSenior
Salary129,000 - 304,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years
- Required Skills
- AWSPythonKubernetesGoGrafanaPrometheusDevOpsTerraform
Requirements
- 7+ years of relevant professional experience in devops, infrastructure, SRE, and/or platform teams
- Ability to develop software outside of the scope of typical infrastructure requirements and configurations
- Proficiency in programming in C, C++, Java, Python, Go, Perl, or Ruby
- Expert knowledge in designing, developing, and managing large real-time systems
- Experience with monitoring and logging (exporting metrics using Prometheus, building Grafana dashboards, centralized logging solutions like ELK Stack or Splunk)
- Experience with distributed systems and container orchestration (Kubernetes)
- Strong communication skills for giving/receiving feedback, planning meetings, and code reviews
- Comfortable with AWS, Terraform/Terragrunt, Calico, ArgoCD, and GitHub Actions
Responsibilities
- Build and orchestrate Modern OTEL-based Observability Platform
- Support multiple telemetry types, like metrics, logs and traces
- Define and support modern governance in observability and problems at scale
- Ensure reliability, security, and performance exceed our defined SLAs
- Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
- Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action
- Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline
- Oversee the availability, performance, and supportability of our observability infrastructure
- Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data
- Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release
View Full Description & ApplyYou'll be redirected to the employer's site