Apply

Staff Site Reliability Engineer (SRE)

Posted 6 months agoViewed

View full description

💎 Seniority level: Senior, 5+ years

💸 Salary: 152000.0 - 230500.0 USD per year

🔍 Industry: IT and Security

🏢 Company: Cribl👥 251-500💰 $150,000,000 Series D over 2 years agoReal TimeBig DataInformation TechnologySoftware

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptGrafanaPrometheusDevOps

Requirements:
  • Extensive experience with enterprise scale continuous delivery environments.
  • 5+ years of experience with a DevOps or SRE job title.
  • Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment.
  • Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible.
  • Experience with sustainable incident response in a blameless environment.
  • Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies.
  • Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
  • Background in Linux Systems Engineering.
  • Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
  • Comfortable with a high level of autonomy and working with a distributed team.
Responsibilities:
  • Engage with teams and improve service delivery and reliability across their entire lifecycle.
  • Measure and monitor all production systems with an eye towards availability, latency and overall system health.
  • Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence.
  • Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability.
  • Help Identify and drive down toil with creative innovation and automation.
  • On-call responsibilities.
Apply

Related Jobs

Apply

📍 Canada

🧭 Full-Time

🔍 Observability and data management

🏢 Company: Cribl👥 251-500💰 $150,000,000 Series D over 2 years agoReal TimeBig DataInformation TechnologySoftware

  • Extensive experience with enterprise-scale continuous delivery environments.
  • Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment.
  • Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible.
  • Knowledge of cloud platforms (prefer AWS and Azure, GCP is nice to have) and container + orchestration technologies.
  • Extensive experience designing and implementing Observability platforms based on OpenSource tools like Grafana, Prometheus, OpenSearch.
  • Experience mentoring engineers and acting as Subject Matter Expert in areas of Monitoring and Observability.
  • Experience with native monitoring services in AWS, Azure and other popular Cloud Platforms.
  • Background in Linux Systems Engineering.
  • Experience with Incident response tools, e.g., PagerDuty, FireHydrant.
  • Experience with sustainable incident response in a blameless environment.
  • Comfortable with a high level of autonomy and working with a distributed team.
  • Engage with teams and improve service delivery and reliability across their entire lifecycle.
  • Measure and monitor all production systems with an eye towards availability, latency, and overall system health.
  • Design observability systems for different types of applications, using Cribl products and other OpenSource tools.
  • Seek out the cause of errors and instability in production cloud services and drive teams towards better operational excellence.
  • Engage with product and platform teams to evolve systems by lobbying for changes that improve reliability, resilience, and observability.
  • Lead efforts enabling shift-left monitoring in the organization.
  • Help identify and drive down toil with creative innovation and automation.
  • On-call responsibilities.

AWSDockerNode.jsGCPJavascriptTypeScriptAzureGrafanaPrometheusLinuxTerraform

Posted about 1 month ago
Apply