Site Reliability Engineer

Posted 2024-10-23

View full description

📍 Location: United States

🏢 Company: Jahnel Group

🗣️ Languages: English

🪄 Skills: AWSDockerPythonCybersecurityCommunication Skills

Requirements:

Strong written and verbal communication skills.
Exceptional planning, organizational, and problem-solving abilities.
Ability to thrive in a fast-paced environment.
Advanced understanding of Windows (Windows 11, Windows Server 2022) and Linux systems (Redhat, SSH).
Strong scripting skills (Python) and Infrastructure as Code (IaC) experience with Terraform and Docker.
Deep knowledge of AWS infrastructure, including CloudFormation, AWS CDK, and Terraform.
Advanced understanding of enterprise networking, VPN, 802.1x authentication, and cybersecurity tools like NGAV and EDR.
Knowledge of security standards such as NIST and STIG.
Certifications like MCSE, CompTIA Server+, or Red Hat Certified System Administrator are a plus.

Responsibilities:

Monitor the health of servers, databases, networks, and security.
Optimize cybersecurity tools like antivirus and spam filtering.
Manage security patches and updates, including sandbox testing.
Plan and execute upgrades and security compliance projects.
Oversee vendor relationships and manage software licenses.
Scale servers/services for changing loads.
Analyze and resolve security and vulnerability threats.
Respond to and resolve server monitoring alerts.
Create hardened images for servers.
Install and update servers, cloud services, and applications.
Review configurations for services like antivirus, VPN, and MFA.
Manage server and email certificates.
Resolve security issues from penetration tests.
Conduct fire drills for incidents and ensure compliance with PCI-DSS v4 and NIST standards.

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-21

📍 United States

🧭 Full-Time

🔍 Legal technology

🏢 Company: Ramp Talent

Curiosity, willingness to learn, and passion for continuous improvement.
Proficiency in all skills expected of SRE II's.
Bachelor's degree in computer science, information systems, related field; comparable certifications; or equivalent direct work experience.
A minimum of 8 years of experience in hands-on technical roles.
A minimum of 2 years of Site Reliability Engineering experience.
Experience building autonomous systems that manage software operational details without human intervention.

Developing autonomous systems that manage the details necessary to build, deploy, test, and operate all Filevine Inc. products.
Being the voice of Reliability on your team throughout the SDLC.
Collecting, monitoring, aggregating, dashboarding, and alerting on software and server events.
Improving the CI/CD pipeline.
Developing playbooks, tools, and scripts to streamline processes and shorten problem resolution time.
Identifying and fixing gaps in the availability of systems.
Improving and defending the security of software and systems.
Documenting and diagramming processes, procedures, and best practices.
Finding, learning, improving, or creating new tools that are reliable, usable, and helpful.
Mentoring, training, and reviewing more junior engineers.
Participating in an on-call rotation for 24/7 production reliability support.

LeadershipCI/CDMentoring

Posted 2024-11-21

Apply

🔥 Site Reliability Engineer - Embedded

Posted 2024-11-21

📍 United States

🧭 Full-Time

💸 102600 - 120323 USD per year

🔍 Recycling technology

🏢 Company: AMP Sortation

Strong technical communication skills for ticket escalations.
Strong interpersonal skills for communicating with individuals impacted by downtime.
Experience troubleshooting Linux systems.
Demonstrated coding experience in C++ or Rust.
Desire to learn professional software engineering practices.
Proficiency in managing tasks under sprint or kanban methodology.
Passion for green technology and emissions reduction.

Triage and respond to tickets during core working hours.
Troubleshoot operating system, hardware, networking, and application issues.
Maintain documentation for engineering support.
Define improvements to the Jira ticketing system.
Develop and support AMP's observability stack.

C++JiraGrafanaPrometheusRustCommunication SkillsLinuxDocumentation

Posted 2024-11-21

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-11-20

📍 USA

💸 170000 - 190000 USD per year

🔍 Email Security

🏢 Company: Valimail

5+ years experience building and maintaining highly available relational databases.
Work collaboratively with cross functional teams
Value team success over individual success
Put industry and engineering best practices into practice and promotes them to others
Passion for reliable, scalable, and performant datastores with strong sense of ownership
Experience building and supporting highly performant and highly reliable datastores
Deep experience working with Postgres
Expert in database fundamentals, SQL, PL/pgSQL, (or other)
Experience with NOSQL datastores and caching solutions
Working knowledge of AWS or Azure cloud providers
Experience with Infrastructure-as-Code tools, such as Terraform

Evangelizing standard methodologies for building and operating highly reliable data storage systems
Serving as the subject matter expert in datastore design and performance
Building and supporting Valimail’s mission-critical datastores
Conducting timely post mortems of production datastore incidents
Collaboratively designing systems with other engineers to meet reliability, scalability, and performance requirements
Providing assistance to teams working with datastores
Automating routine database tasks
Participating in on-call rotation and incident response.
Upgrade data storage systems as necessary

AWSSQLAzurePostgresNosqlTerraform

Posted 2024-11-20

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-16

📍 U.S.

🧭 Full-Time

💸 140000 - 160000 USD per year

🔍 Cybersecurity / Open source software

Sense of curiosity, resourcefulness, and pragmatism.
Expertise with multi-region deployments in public cloud environments.
Demonstrable production Kubernetes experience (Managed Kubernetes, Helm, kubectl, kOps, etc.).
Strong background in Reliability Engineering, DevOps, Software Engineering.
Fluency with at least one programming language, such as C#, Python, or Go.
Experience with cloud deployment and automation tools/methodologies (i.e. GitOps, Terraform, Pulumi).
Proficiency using source control such as Git.
Ability to maintain discretion and handle sensitive information.
Staying current with trends and new technologies.
Collaborative and adaptable mindset.
Excellent communication skills.
Strong problem-solving skills.

Take ownership of the Bitwarden cloud infrastructure, focusing on quality.
Evaluate infrastructure regularly, making recommendations for reliability, security, availability, scalability, and cost management.
Implement site reliability tools and observability systems.
Respond to outages and participate in a 24x7 support strategy.
Contribute to architectural designs and engineering operations at scale.
Engage in code reviews and spread technical knowledge.
Contribute to incident management processes.
Collaborate with teams to refine priorities and deliverables.
Align SLIs, SLOs, and SLAs with product owners.
Identify opportunities for new initiatives.
Influence the SDLC as Bitwarden scales.
Mentor team members.

PythonGitKubernetesC#StrategyGoCommunication SkillsDevOpsTerraform

Posted 2024-11-16

Apply

🔥 Principal Site Reliability Engineer

Posted 2024-11-15

📍 United States

🧭 Full-Time

💸 204000 - 281000 USD per year

🔍 Cybersecurity

🏢 Company: SentinelOne

Extensive SRE Experience: Proven experience in architecting and implementing SRE solutions at scale within a microservices or distributed systems environment.
15+ years of progressive professional experience, with 5+ years of recent experience supporting enterprise SaaS environments.
Technical Expertise: Deep knowledge of incident management, alert correlation, automated triage, and SLO frameworks.
Proficiency in one or more programming languages (e.g., Python, Go, Java) with experience in automation and scripting.
Experience with machine learning and data analytics for real-time alert systems.
Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes).
Ability to make critical architectural decisions focused on business impact and system performance.

Design and guide the implementation of end-to-end alert correlation, auto-triage, and auto-remediation frameworks for a microservices SaaS architecture.
Ensure solutions align with business priorities and customer impact goals.
Define, implement, and monitor SLOs in collaboration with product and engineering teams.
Establish reliability standards to drive accountability around service performance.
Partner with software engineers, SREs, and data scientists to implement monitoring, alerting, and SLO solutions.
Lead initiatives promoting best practices across SentinelOne engineering.
Mentor engineers and contribute to a culture of reliability engineering excellence.

AWSLeadershipPythonData AnalysisGCPJavaKubernetesMachine LearningAzureData analysisGoCollaborationTerraformMicroservices

Posted 2024-11-15

Apply

🔥 Site Reliability Engineer, Edge

Posted 2024-11-13

📍 United States

💸 192000 - 288000 USD per year

🔍 Frontend Cloud and web services

🏢 Company: Vercel

At least 3 years of experience in an SRE role, or at least 5 years of experience in an adjacent role (e.g., platform engineering), operating in a scaled environment.
Firm grasp of the SRE philosophy and mindset, with practical experience working on or directly with SRE teams that have proactively engaged in system design and improvement.
Strong sense of accountability and commitment to problem-solving, backed by curiosity to dig deep and identify root causes.
Willingness to proactively engage with development teams to influence the course of software design and operational practices.
Capability to manage risk, make decisions, and exhibit sound judgment.
Demonstrated ability to plan and deliver long-term projects.
Familiarity with networking protocols and application serving.
Experience deploying and operating systems on AWS infrastructure at scale.
Bonus: Experience working with Terraform, Kubernetes, Golang, and/or Lua.

Ensure that our products are built for reliability and scale by engaging in the end-to-end design, development, and deployment of new software.
Drive continuous risk mitigation and reduction through direct involvement in incident management, blameless postmortems, and follow-ups.
Drive measurable improvements to the reliability, performance, and efficiency of our production systems through instrumentation, analysis, and implementation of engineering improvements.
Devise repeatable, low-toil operational practices through the development of automated systems for software delivery, system failover, and capacity management.

AWSProblem Solving

Posted 2024-11-13

Apply

🔥 Sr. Site Reliability Engineer, Incident Excellence

Posted 2024-11-12

📍 United States

🧭 Full-Time

💸 147100 - 207600 USD per year

🔍 Cloud Infrastructure and Software Engineering

🏢 Company: HashiCorp

Professional experience designing or operating disaster recovery processes in a distributed cloud environment.
Professional experience with incident management in cloud environments.
Enjoy working on various scopes spanning software engineering, cloud infrastructure, and SRE.
Experience contributing to efficiency improvements of software at scale.
Experience collaborating cross-functionally to deliver engineering culture change.
Worked on infrastructure teams in customer-centric and agile organizations with empathy and compassion.
Worked with SaaS or other managed software offerings.
Experience in one or more of the major public clouds.

Utilize software engineering experience to solve problems and build automation for incident lifecycle management.
Coordinate disaster recovery processes and identify strategic process improvements.
Drive incident management capabilities and culture.
Participate in incident command on-call rotation.
Support incident management tooling.
Build technical skills and relationships within a team of engineers and SREs.
Learn, teach, and collaborate cross-functionally.

AgileProduct DevelopmentStrategyCommunication SkillsCollaboration

Posted 2024-11-12

Apply

🔥 Senior Site Reliability Engineer - US/Canada

Posted 2024-11-09

📍 United States, Canada

🧭 Full-Time

🔍 Security and fraud detection

🏢 Company: DataVisor

5+ years of experience with production environment running Linux.
3+ years of experience with cloud solutions such as AWS, Azure, or Aliyun.
Familiarity with big data technologies such as Spark and/or Flink.
Passion for automating tasks through coding and scripting.
Experience with algorithms, data structures, complexity analysis, and software design.
Proficient coding skills in Python, Java, and Bash.

Design, implement, and maintain release automation pipelines to streamline the deployment process.
Develop systems for proactive monitoring, auto-diagnosis, and incident resolution in production environments.
Work with big data platforms such as Apache Spark or Apache Flink, optimizing and scaling data processing pipelines.
Perform maintenance and troubleshooting for databases, preferably Yugabyte, ClickHouse, and MySQL.
Ensure the reliability of cloud infrastructure using Kubernetes on AWS or GCP.
Participate in on-call rotation for system reliability, focusing on automation to minimize manual intervention.
Collaborate with engineering teams to enhance system performance and manage capacity planning.

Linux

Posted 2024-11-09

Apply

🔥 Staff Site Reliability Engineer

Posted 2024-11-09

📍 CA, CO, CT, FL, GA, HI, IL, IN, IA, MD, MA, MI, MO, NJ, NM, NY, NC, OH, PA, TN, TX, UT, VA, WA

🧭 Full-Time

💸 135520 - 178060 USD per year

🔍 Non-profit mental health support

🏢 Company: Crisis Text Line

Bachelor's degree in Computer Science, Engineering, or related field; Master’s preferred.
Proven experience as a Staff SRE or in a similar role.
Maintaining reliability of online SaaS/PaaS.
Proficiency in AWS and infrastructure as code (Terraform, CloudFormation).
Strong scripting skills (Python) and knowledge of containerization (Docker, Kubernetes).
Experience in CI/CD pipelines and observability tools (GitHub Actions, Datadog).
Understanding of network protocols and security principles.

Assisting to lead and mentor a team of 5 SREs.
Designing, implementing, and maintaining AWS infrastructure.
Collaborating with developers for performance optimization.
Developing monitoring, logging, and alerting systems.
Automating repetitive tasks to improve efficiency.
Responding to incidents to minimize downtime.
Supporting diversity on the engineering team.
Communicating expectations and progress clearly.
Providing mentorship and promoting technical best practices.
Participating in retrospectives to improve processes.
Conducting regular security audits.

AWSDockerGraphQLPHPPythonGCPKubernetesAzureData StructuresGoNext.jsCommunication SkillsCollaborationCI/CDDevOpsTerraformCompliance

Posted 2024-11-09

Apply

🔥 Senior Site Reliability Engineer (SRE)

Posted 2024-11-07

📍 US

🧭 Full-Time

💸 198000 - 220000 USD per year

🔍 Blockchain, Cryptocurrency

🏢 Company: Uniswap Labs

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in site reliability engineering, DevOps, or related fields.
Strong understanding of reliability engineering principles and tools.
Proficiency in monitoring tools like Prometheus, Grafana, Nagios.
Experience with cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker).
Proficiency in scripting tools such as Python, Bash, Ansible, or Terraform.

Design, implement, and maintain systems for reliability, availability, and performance of services.
Develop and manage monitoring, alerting, and incident response strategies.
Conduct root cause analysis of failures.
Collaborate with cross-functional teams on reliability practices.
Drive improvements and innovations in systems and processes.

AWSDockerPythonBashGCPKubernetesAzureGrafanaPrometheusCollaborationCI/CDDevOps

Posted 2024-11-07

Apply

Site Reliability Engineer

Requirements:

Responsibilities:

Related Jobs

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities