Apply

Principal Site Reliability Engineer

Posted 2024-11-15

View full description

💎 Seniority level: Principal, 15+ years

📍 Location: United States

💸 Salary: 204000 - 281000 USD per year

🔍 Industry: Cybersecurity

🏢 Company: SentinelOne

🗣️ Languages: English

⏳ Experience: 15+ years

🪄 Skills: AWSLeadershipPythonData AnalysisGCPJavaKubernetesMachine LearningAzureData analysisGoCollaborationTerraformMicroservices

Requirements:
  • Extensive SRE Experience: Proven experience in architecting and implementing SRE solutions at scale within a microservices or distributed systems environment.
  • 15+ years of progressive professional experience, with 5+ years of recent experience supporting enterprise SaaS environments.
  • Technical Expertise: Deep knowledge of incident management, alert correlation, automated triage, and SLO frameworks.
  • Proficiency in one or more programming languages (e.g., Python, Go, Java) with experience in automation and scripting.
  • Experience with machine learning and data analytics for real-time alert systems.
  • Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes).
  • Ability to make critical architectural decisions focused on business impact and system performance.
Responsibilities:
  • Design and guide the implementation of end-to-end alert correlation, auto-triage, and auto-remediation frameworks for a microservices SaaS architecture.
  • Ensure solutions align with business priorities and customer impact goals.
  • Define, implement, and monitor SLOs in collaboration with product and engineering teams.
  • Establish reliability standards to drive accountability around service performance.
  • Partner with software engineers, SREs, and data scientists to implement monitoring, alerting, and SLO solutions.
  • Lead initiatives promoting best practices across SentinelOne engineering.
  • Mentor engineers and contribute to a culture of reliability engineering excellence.
Apply

Related Jobs

Apply

📍 United States

🧭 Full-Time

🔍 Cybersecurity and Advanced Technology

🏢 Company: Global InfoTek, Inc.

  • Bachelor's degree in computer science, Mathematics, or equivalent technical degree or equivalent industry experience.
  • Three-plus years developing production software in modern languages (Java, Python, Go, NodeJS, etc.).
  • One-plus year developing containerized services on orchestration platforms (Kubernetes, Mesos, Swarm, etc.).
  • Three-plus years of experience with agile and lean software development philosophies.
  • One-plus year working with relational and/or non-relational databases (PostgreSQL, MySQL, MongoDB, Elasticsearch, etc.).
  • Two-plus years with version control systems (Git, Subversion, Mercurial, etc.).
  • Five-plus years building and maintaining Kubernetes clusters across hybrid-cloud infrastructure.
  • Eight-plus years in Operations, DevOps, or Site Reliability Engineering.
  • Five-plus years in configuration/package management using tools like Terraform and Helm.
  • Five-plus years experience with Cloud service monitoring (Prometheus, Grafana, FluentD, ElasticStack, etc.).
  • Proficient in Linux system administration.
  • Experience with GitLab CI pipelines.
  • Experience creating automation using APIs from Azure or Google Cloud.

  • Build and maintain infrastructure as code on large scale multi-site deployments.
  • Evaluate and assess new ways to scale platform capabilities.
  • Automate workflows to enable continuous delivery on hybrid infrastructure.
  • Troubleshoot issues until root causes are understood on high traffic production systems.
  • Participate in design and code review processes.
  • Coordinate infrastructure changes with product owners.
  • Identify bottlenecks and improve platform performance.

PostgreSQLPythonSoftware DevelopmentAgileElasticSearchGitJavaKubernetesMongoDBMySQLAzureElasticsearchGoGrafanaPrometheus

Posted 2024-10-23
Apply