ApplySenior Site Reliability Engineer - Platform
Posted 2 months agoViewed
View full description
💎 Seniority level: Senior, 5+ years
📍 Location: USA
🔍 Industry: Cryptocurrency
🏢 Company: Referrals Only Board
🗣️ Languages: English
⏳ Experience: 5+ years
🪄 Skills: DockerPythonBlockchainEthereumJavascriptKubernetesRubyAlgorithmsData StructuresGoCommunication SkillsLinuxTerraform
Requirements:
- At least 5+ years of software engineering experience.
- Strong understanding of data structures and algorithms related to performance and reliability.
- Fluency in at least one programming language such as Golang, Ruby, Python, or JavaScript.
- Strong skills around observability, debugging, and performance tuning.
- Ability to debug complex systems and willingness to understand and improve any layer of the stack.
- Experience with container orchestration systems (Docker, ECS, EKS) and monitoring tools (DataDog, Graphite, Grafana, Prometheus).
- Deep knowledge of UNIX/Linux system internals including system calls, TCP/IP, and debugging tools.
- Strong communication skills and ability to explain technical concepts clearly.
- Demonstrated critical thinking under pressure.
Responsibilities:
- Build automation and improve systems to eliminate toil and operations work.
- Improve observability, reliability, and availability by defining and measuring key metrics.
- Collaborate with the core infrastructure team to performance tune and optimize cloud deployments.
- Collaborate with product teams to reduce service disruptions and automate incident response.
- Proactively find and analyze reliability problems and design software for improvements.
- Facilitate incident response, conduct root cause analysis, and blameless retrospectives.
- Educate and mentor the engineering team to enhance system reliability and promote reliability as a core value.
Apply