Apply

Staff Site Reliability Engineer (SRE)

Posted 2024-08-28

View full description

💎 Seniority level: Staff, 8+ years

💸 Salary: 144000 - 278000 USD per year

🔍 Industry: IT and Security

🏢 Company: Cribl👥 251-500💰 $150.0m Series D on 2022-05-24Real TimeBig DataInformation TechnologySoftware

🗣️ Languages: English

⏳ Experience: 8+ years

🪄 Skills: AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptGrafanaPrometheusJavaScriptLinuxDevOpsTerraform

Requirements:
  • Extensive experience with enterprise scale continuous delivery environments.
  • 8+ years of experience with a DevOps or SRE job title.
  • Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment.
  • Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible.
  • Experience with sustainable incident response in a blameless environment.
  • Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies.
  • Experience with APM and Observability and related tools such as New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
  • Background in Linux Systems Engineering.
  • Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
  • Comfortable with a high level of autonomy and working with a distributed team.
Responsibilities:
  • Engage with teams and improve service delivery and reliability across their entire lifecycle.
  • Measure and monitor all production systems with an eye towards availability, latency and overall system health.
  • Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence.
  • Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability.
  • Help identify and drive down toil with creative innovation and automation.
  • On-call responsibilities.
Apply

Related Jobs

Apply

📍 Poland

🔍 IT and Security

🏢 Company: Cribl👥 251-500💰 $150.0m Series D on 2022-05-24Real TimeBig DataInformation TechnologySoftware

  • Extensive experience with enterprise scale continuous delivery environments.
  • Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment.
  • Experience with sustainable incident response in a blameless environment.
  • Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible.
  • Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies.
  • Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
  • Background in Linux Systems Engineering.
  • Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
  • Comfortable with a high level of autonomy and working with a distributed team.

  • Engage with teams and improve service delivery and reliability across their entire lifecycle.
  • Measure and monitor all production systems with an eye towards availability, latency and overall system health.
  • Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence.
  • Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability.
  • Help identify and drive down toil with creative innovation and automation.
  • On-call responsibilities.

AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptJavaScriptGrafanaPrometheusLinuxTerraform

Posted 2024-10-03
Apply
Apply

🧭 Full-Time

💸 152000 - 230500 USD per year

🔍 IT and Security

🏢 Company: Cribl👥 251-500💰 $150.0m Series D on 2022-05-24Real TimeBig DataInformation TechnologySoftware

  • Extensive experience with enterprise scale continuous delivery environments.
  • 5+ years of experience in a DevOps or SRE role.
  • Development skills in JavaScript/Node.js/TypeScript within a Linux/Mac environment.
  • Familiarity with Configuration Management Tools such as Terraform, Puppet, Chef, or Ansible.
  • Experience with sustainable incident response in a blameless environment.
  • Knowledge of cloud platforms, preferably AWS, and familiarity with container and orchestration technologies.
  • Experience with APM and observability tools, including New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry, etc.
  • Background in Linux Systems Engineering.
  • Familiarity with incident response tools such as PagerDuty, FireHydrant, Blameless, etc.
  • Ability to work autonomously and within a distributed team.

  • Engage with teams to enhance service delivery and reliability throughout their lifecycle.
  • Measure and monitor all production systems focusing on availability, latency, and overall system health.
  • Investigate causes of errors and instability in production cloud services and guide teams towards improved operational excellence.
  • Collaborate with product and platform teams to advocate for changes that enhance reliability, resilience, and observability.
  • Identify and minimize toil through creative innovation and automation.
  • Fulfill on-call responsibilities.

AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptJavaScriptGrafanaPrometheusDevOps

Posted 2024-08-10
Apply