Apply

Software Engineer, Site Reliability (Senior or Staff)

Posted 5 days agoViewed

View full description

💎 Seniority level: Senior, 10-12+ years

📍 Location: United States, Canada

🔍 Industry: Software Development

🏢 Company: BioRender👥 101-250💰 $15,319,133 Series A almost 2 years agoLife ScienceGraphic DesignSoftware

🗣️ Languages: English

⏳ Experience: 10-12+ years

🪄 Skills: AWSDockerLeadershipPythonSQLBashCloud ComputingJavascriptKubernetesSoftware ArchitectureTypeScriptGoGrafanaPrometheusREST APICommunication SkillsAnalytical SkillsCI/CDProblem SolvingAgile methodologiesRESTful APIsMentoringLinuxDevOpsTerraformWritten communicationMicroservicesNetworkingAdaptabilityTeamworkTroubleshootingActive listeningStrong work ethicStrong communication skillsAnsibleSoftware EngineeringDebugging

Requirements:
  • 10-12+ years of experience in the software/DevOps/SRE realm
  • Strong programming skills in 2 or more of these languages: javascript, typescript, python, Go
  • Ability to troubleshoot complex distributed systems at scale
  • Database Performance Monitoring and best practices
  • Comfortable innovating and establishing new practices, processes, and tooling
  • Strong analytical skills, system design, and architecture for cloud applications
  • CI/CD, configuration management, monitoring, and automation expertise
  • Advanced knowledge of observability and best practices (ELK, Datadog, OpenTelemetry, Prometheus, Grafana)
  • Deployment and orchestration via AWS ECS, k8s, CloudRun etc.
  • Understanding of Linux, virtualization, networking, VPCs, firewalls, security groups
  • Hands-on knowledge of AWS and resources provisioning via CLI/API/IaC
  • Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience.
Responsibilities:
  • Enhance platform resilience by constantly seeking ways to improve the reliability, scalability and release efficiency of the platform
  • Develop Robust Observability and Monitoring Solutions: Define, build, deploy, maintain, and extend advanced observability and monitoring tools to bolster system reliability and availability.
  • Define and Monitor Performance Metrics: Play a key role in formulating and tracking Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to establish precise benchmarks for system performance.
  • Solve Complex Issues and Conduct Root Cause Analysis: Swiftly respond to escalated incidents, troubleshoot intricate system and application problems, and conduct thorough root cause analyses to implement corrective measures.
  • Thought Leadership and Innovation: stay up to date with the latest industry trends and emerging technologies and iterate on best practices to increase the quality & velocity of development and deliverables.
  • Architect Scalable and Reliable Systems: Lead in the design and architecture of scalable, distributed, fault-tolerant systems that uphold performance and reliability standards.
  • Mentorship and Evangelism: Champion the adoption of new technologies, disseminate best practices, and advocate for architectural patterns. Mentor and guide fellow engineers in the organization.
Apply