Senior Site Reliability Engineer

Posted about 2 months agoViewed

💎 Seniority level: Senior

📍 Location: United States

💸 Salary: 127000.0 - 249000.0 USD per year

🔍 Industry: Database and Cloud Services

🗣️ Languages: English

🪄 Skills: Linux

Experience running a mission-critical service at scale.
Understanding of information security issues.
Prior experience with critical production systems in a Linux environment.
Proficiency in at least one modern programming language, beyond basic scripting.
Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc.).
Bachelor’s degree in Computer Science or equivalent experience.
Experience writing automation tools and eagerness to automate.

Design and build the infrastructure for a global cloud service that comprises hundreds of thousands of MongoDB clusters.
Implement and troubleshoot automation and monitoring of global services spanning several cloud providers.
Optimize infrastructure performance from application level to firmware.
Participate in a weekly on-call rotation.
Improve infrastructure capabilities, focusing on cost, simplicity, and maintainability.

Posted 2 months ago

📍 United States, Canada

🧭 Full-Time

🔍 Security and fraud detection

🏢 Company: DataVisor

5+ years of experience with production environment running Linux.
3+ years of experience with cloud solutions such as AWS, Azure, or Aliyun.
Familiarity with big data technologies such as Spark and/or Flink.
Passion for automating tasks through coding and scripting.
Experience with algorithms, data structures, complexity analysis, and software design.
Proficient coding skills in Python, Java, and Bash.

Design, implement, and maintain release automation pipelines to streamline the deployment process.
Develop systems for proactive monitoring, auto-diagnosis, and incident resolution in production environments.
Work with big data platforms such as Apache Spark or Apache Flink, optimizing and scaling data processing pipelines.
Perform maintenance and troubleshooting for databases, preferably Yugabyte, ClickHouse, and MySQL.
Ensure the reliability of cloud infrastructure using Kubernetes on AWS or GCP.
Participate in on-call rotation for system reliability, focusing on automation to minimize manual intervention.
Collaborate with engineering teams to enhance system performance and manage capacity planning.

Linux

Posted 2 months ago

Posted 2 months ago

📍 USA

🧭 Full-Time

🔍 Cryptocurrency

At least 5+ years of software engineering experience.
Strong understanding of data structures and algorithms related to performance and reliability.
Fluency in at least one programming language such as Golang, Ruby, Python, or JavaScript.
Strong skills around observability, debugging, and performance tuning.
Ability to debug complex systems and willingness to understand and improve any layer of the stack.
Experience with container orchestration systems (Docker, ECS, EKS) and monitoring tools (DataDog, Graphite, Grafana, Prometheus).
Deep knowledge of UNIX/Linux system internals including system calls, TCP/IP, and debugging tools.
Strong communication skills and ability to explain technical concepts clearly.
Demonstrated critical thinking under pressure.

Build automation and improve systems to eliminate toil and operations work.
Improve observability, reliability, and availability by defining and measuring key metrics.
Collaborate with the core infrastructure team to performance tune and optimize cloud deployments.
Collaborate with product teams to reduce service disruptions and automate incident response.
Proactively find and analyze reliability problems and design software for improvements.
Facilitate incident response, conduct root cause analysis, and blameless retrospectives.
Educate and mentor the engineering team to enhance system reliability and promote reliability as a core value.

DockerPythonBlockchainEthereumJavascriptKubernetesRubyAlgorithmsData StructuresGoCommunication SkillsLinuxTerraform

Posted 2 months ago

🔧 Requirements