Senior Site Reliability Engineer

Posted 3 months agoViewed
United StatesFull-TimeHealthcare Workforce Management
Company:QGenda
Location:United States
Languages:English
Seniority level:Senior, 7+ years
Experience:7+ years
Skills:
AWSDockerLeadershipNode.jsPythonSoftware DevelopmentSQLAgileBashGitCI/CDLinuxDevOpsTerraformMicroservicesMentoring
Requirements:
B.S. in Computer Science, Computer Information Systems, or Computer Engineering from a major U.S. university or equivalent industry experience 7+ years of experience as a DevOps, SRE or Systems Engineer Advanced proficiency with at least one scripting or programming language Experience with Docker and container orchestration tools such as AWS ECS Hands-on experience building infrastructure and supporting applications in AWS using services such as Lambda, EC2, ECS, S3, SNS, SQS, RDS, Redshift, and Elasticache Experience with logging, creating dashboards, and alerts using observability tools such as Datadog and Amazon CloudWatch Strong understanding of networking and DNS Familiarity with configuration management and infrastructure as code (IaC) tools such as Terraform Firm understanding and experience with Agile and Scrum SDLC processes Using distributed version control system experience (Git preferred) to check-in code, branching, merging, pull request, code review, etc Knowledge of CI/CD best practices and tools such as AWS CodeBuild, Jenkins and/or TeamCity Experience designing and delivering secure, high performance and highly available cloud services
Responsibilities:
Design, implement, and manage scalable systems that ensure high availability, fault tolerance, and optimal performance. Continuously monitor and enhance system health and performance through data analysis and metrics. Embed observability (metrics, logs, traces, alerts) with actionable thresholds and up-to-date runbooks. Eliminate toil by building automation and self-service tools for common operational workflows. Own CI/CD pipelines (build, test, security scans) and enable progressive delivery (blue/green, canary). Manage infrastructure as code via Terraform and configuration management with Git-backed workflows. Participate in on-call; triage, mitigate, and resolve incidents within defined SLAs. Lead incident response and blameless post-incident reviews; document RCAs and drive corrective actions to closure. Maintain runbooks/playbooks and regularly perform disaster recovery scenarios. Operate and secure AWS environments (IAM, VPC, EC2/ECS, RDS, S3, Lambda, etc.) with a focus on resilience and compliance. Optimize cost, performance, and reliability (rightsizing, autoscaling, reservations/savings plans, tagging, spend monitoring, etc.). Serve as a technical advisor to engineering teams on infrastructure and operations best practices. Mentor peers on SRE practices; promote observability, continuous improvement, and a blameless culture. Contribute to roadmaps and capacity planning to align reliability goals with product objectives.
About the Company
QGenda
251-500 employeesService Industry
View Company Profile
Similar Jobs:
Posted 4 days ago
United StatesFull-TimeSoftware Development
Senior Site Reliability Engineer
Company:EngagedMD
Posted 4 days ago
United States, Canada, Argentina, BrazilFull-TimeSoftware Development
Senior Site Reliability Engineer
Company:Laravel
Posted 23 days ago
Most US StatesFull-TimeFinancial Technology
Senior Site Reliability Engineer
Company:DriveWealth