Staff Site Reliability Engineer (SRE)

Posted 2024-08-28

💎 Seniority level: Staff, 8+ years

💸 Salary: 144000 - 278000 USD per year

🔍 Industry: IT and Security

🗣️ Languages: English

⏳ Experience: 8+ years

🪄 Skills: AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptGrafanaPrometheusJavaScriptLinuxDevOpsTerraform

Extensive experience with enterprise scale continuous delivery environments.
8+ years of experience with a DevOps or SRE job title.
Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment.
Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible.
Experience with sustainable incident response in a blameless environment.
Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies.
Experience with APM and Observability and related tools such as New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
Background in Linux Systems Engineering.
Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
Comfortable with a high level of autonomy and working with a distributed team.

Engage with teams and improve service delivery and reliability across their entire lifecycle.
Measure and monitor all production systems with an eye towards availability, latency and overall system health.
Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence.
Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability.
Help identify and drive down toil with creative innovation and automation.
On-call responsibilities.

Posted 2024-10-03

📍 Poland

🔍 IT and Security

Extensive experience with enterprise scale continuous delivery environments.
Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment.
Experience with sustainable incident response in a blameless environment.
Experience with Configuration Management Tools like Terraform (preferred) or Puppet, Chef, Ansible.
Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies.
Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc.
Background in Linux Systems Engineering.
Experience with Incident response related tools for instance, PagerDuty, FireHydrant, Blameless etc.
Comfortable with a high level of autonomy and working with a distributed team.

Engage with teams and improve service delivery and reliability across their entire lifecycle.
Measure and monitor all production systems with an eye towards availability, latency and overall system health.
Seek out the cause of errors and instability in our production cloud services and drive teams towards better operational excellence.
Engage with product and platform teams to improve and evolve systems by lobbying for changes that improve reliability, resilience, and observability.
Help identify and drive down toil with creative innovation and automation.
On-call responsibilities.

AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptJavaScriptGrafanaPrometheusLinuxTerraform

Posted 2024-10-03

Posted 2024-08-10

🧭 Full-Time

💸 152000 - 230500 USD per year

🔍 IT and Security

Extensive experience with enterprise scale continuous delivery environments.
5+ years of experience in a DevOps or SRE role.
Development skills in JavaScript/Node.js/TypeScript within a Linux/Mac environment.
Familiarity with Configuration Management Tools such as Terraform, Puppet, Chef, or Ansible.
Experience with sustainable incident response in a blameless environment.
Knowledge of cloud platforms, preferably AWS, and familiarity with container and orchestration technologies.
Experience with APM and observability tools, including New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry, etc.
Background in Linux Systems Engineering.
Familiarity with incident response tools such as PagerDuty, FireHydrant, Blameless, etc.
Ability to work autonomously and within a distributed team.

Engage with teams to enhance service delivery and reliability throughout their lifecycle.
Measure and monitor all production systems focusing on availability, latency, and overall system health.
Investigate causes of errors and instability in production cloud services and guide teams towards improved operational excellence.
Collaborate with product and platform teams to advocate for changes that enhance reliability, resilience, and observability.
Identify and minimize toil through creative innovation and automation.
Fulfill on-call responsibilities.

AWSNode.jsDesign PatternsJavascriptKibanaTypeScriptJavaScriptGrafanaPrometheusDevOps

Posted 2024-08-10

🔧 Requirements