Site Reliability Engineer
New
Working remote from BC, Pacific time zoneFull-TimeMiddle
Salary80,000 - 100,000 CAD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 3+ years
- Required Skills
- DockerPythonBashGitKubernetesPrometheusLinuxTerraformAnsible
Requirements
- 3+ years of software and/or operational experience in building and maintaining internet-facing production environments.
- Strong experience with Linux/Unix systems administration.
- Strong scripting abilities in Bash and Python.
- Knowledge of source control tools (Git preferred).
- Experience with Configuration Management and Infrastructure as Code tools (Ansible, Puppet, Terraform preferred).
- Good understanding of container technology (Docker, Kubernetes preferred).
- Experience with monitoring tools (Prometheus, Grafana, Nagios, or similar) and alerting systems.
- Experience running a large-scale 24/7 production environment.
- Experience with incident management, troubleshooting, and root cause analysis.
- Bachelor's degree in information systems, computer science, technology, or a related field preferred; 2+ years of relevant experience accepted in lieu of a degree.
Responsibilities
- Ensure the reliability of critical products and services by meeting or exceeding SRE objectives.
- Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools.
- Build and maintain proper monitoring of services by utilizing centralized logging and time series databases.
- Automate deployments, administration, and monitoring by following CI/CD practices.
- Work with engineering and information security teams to improve the operability and security of services.
- Participate in team on-call rotation.
View Full Description & ApplyYou'll be redirected to the employer's site