Staff Site Reliability Engineer

Posted 2024-11-09

💎 Seniority level: Staff, Proven experience as a Staff SRE or in a similar SRE role.

📍 Location: CA, CO, CT, FL, GA, HI, IL, IN, IA, MD, MA, MI, MO, NJ, NM, NY, NC, OH, PA, TN, TX, UT, VA, WA

💸 Salary: 135520 - 178060 USD per year

🔍 Industry: Non-profit mental health support

🗣️ Languages: English

⏳ Experience: Proven experience as a Staff SRE or in a similar SRE role.

🪄 Skills: AWSDockerGraphQLPHPPythonGCPKubernetesAzureData StructuresGoNext.jsCommunication SkillsCollaborationCI/CDDevOpsTerraformCompliance

Bachelor's degree in Computer Science, Engineering, or related field; Master’s preferred.
Proven experience as a Staff SRE or in a similar role.
Maintaining reliability of online SaaS/PaaS.
Proficiency in AWS and infrastructure as code (Terraform, CloudFormation).
Strong scripting skills (Python) and knowledge of containerization (Docker, Kubernetes).
Experience in CI/CD pipelines and observability tools (GitHub Actions, Datadog).
Understanding of network protocols and security principles.

Posted 2024-10-16

📍 USA

🧭 Full-Time

💸 211650 - 249000 USD per year

🔍 Cryptocurrency and blockchain technology

At least 7+ years of experience in software engineering.
Experience in designing, building, scaling, and maintaining production services.
Ability to write high-quality, well-tested code.
Passion for open financial systems.
Strong technical skills for system design and coding.
Excellent written and verbal communication skills.
Strong skills in observability, debugging, and performance tuning.
Strong interpersonal skills for collaboration with engineers of all levels.
Demonstrated critical thinking skills under pressure.
Willingness to understand and improve any layer of the stack.
On-call availability for issue resolution.

Improve observability, reliability, and availability by defining and measuring key metrics.
Build automation and improve systems to eliminate toil and operations work.
Collaborate with core infrastructure team for performance tuning and optimization of cloud deployments.
Work with product teams to reduce service disruptions and automate incident responses.
Proactively find and analyze reliability issues, implementing software solutions for improvements.
Educate and mentor the engineering team on reliability as a core value.
Write high-quality, well-tested code.
Debug complex technical problems and enhance system deployability.
Review feature designs across the company.
Ensure security, operational integrity, and architectural clarity of designs.
Integrate with third-party vendors through pipelines.
Participate in on-call support for urgent issues.

BlockchainCommunication Skills

Posted 2024-10-16

Posted 2024-09-20

📍 United States of America

🧭 Full-Time

💸 $176,400 - $201,600 per year

🔍 Family history and personal DNA testing

7+ years of experience in site reliability.
5+ years software development experience.
7+ years cloud automation experience using Go, Python, Bash.
5+ years debugging Node.js, Java, and a variety of DB technologies.
5+ years of experience working with AWS Cloud, including services, CLI, SDKs, and AWS Console.
7+ years using Cloud APM and logging tools, such as NewRelic, Prometheus, and AWS monitoring.
5+ years experience in auto scaling, resilience, fault tolerance, AWS infrastructure, cloud networking, and container management.
5+ years experience analyzing production within a cloud environment.
5+ years of Terraform or Cloud Formation experience for infrastructure management with CI/CD pipeline.

Own site reliability for a product vertical in collaboration with engineering.
Define and ensure SLO / SLI and error budgets remain in compliance with standards.
Develop improved monitoring, auto scaling and resiliency patterns and capabilities.
Debug complex issues across multiple services in AWS, including outfacing infrastructure.
Collaborate and develop cloud automation and new best practices in support of vertical and organization.
Train, mentor and support in AWS, Infrastructure and Cloud best practices.
Member of Site Reliability Engineering team which reports up to Site Reliability and Performance Organization.

AWSNode.jsPythonSoftware DevelopmentBashJavaGoPrometheusCollaborationCI/CD

Posted 2024-09-20

🔧 Requirements