Junior Site Reliability Engineer
Must be located in the United StatesFull-TimeJunior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 2+ years experience in 24x7x365 production operations, 2+ years experience installing, managing, and troubleshooting Linux and/or Windows Server operating systems in a production environment, 2+ years experience supporting cloud operations and automation in AWS, Azure or GCP, 2+ years experience with Infrastructure-as-Code and orchestration/automation tools such as Terraform and Ansible
- Required Skills
- AWSPythonBashGCPJiraAzureLinuxTerraformNetworkingAnsibleServiceNow
Requirements
- BS or above in related Information Technology field or equivalent combination of education and experience
- 2+ years experience in 24x7x365 production operations
- Fundamental understanding of networking and networking troubleshooting
- 2+ years experience installing, managing, and troubleshooting Linux and/or Windows Server operating systems in a production environment
- 2+ years experience supporting cloud operations and automation in AWS, Azure or GCP (and aligned certifications)
- 2+ years experience with Infrastructure-as-Code and orchestration/automation tools such as Terraform and Ansible
- Experience with IaaS platform capabilities and services (cloud certifications expected)
- Experience within ticketing tool solutions such as Jira and ServiceNow
- Experience using environmental analytics tools such as Splunk and Elastic Stack for querying, monitoring and alerting
- Experience in at least one primary scripting language (Bash, Python, PowerShell)
- Excellent communication, organizational, and problem-solving skills in a dynamic environment
- Effective documentation skills, to include technical diagrams and written descriptions
- Ability to work as part of a team with professional attitude and demeanor
Responsibilities
- Become a member of a highly collaborative engineering team offering a unique blend of Cloud Infrastructure Administration, Site Reliability Engineering, Security Operations, and Vulnerability Management across multiple clients
- Coordinate with client product teams, engineering team members, and other stakeholders to monitor and maintain a secure and resilient cloud-hosted infrastructure to established SLAs in both production and non-production environments
- Innovate and implement using automated orchestration and configuration management techniques
- Understand the design, deployment, and management of secure and compliant enterprise servers, network infrastructure, boundary protection, and cloud architectures using Infrastructure-as-Code
- Create, maintain, and peer review automated orchestration and configuration management codebases, as well as Infrastructure-as-Code codebases
- Maintain IaC tooling and versioning within Client environments
- Implement and upgrade client environments with CI/CD infrastructure code and provide internal feedback to development teams for environment requirements and necessary alterations
- Work across AWS, Azure and GCP, understanding and utilizing their unique native services in client environments
- Configure, tune, and troubleshoot cloud-based tools, manage cost, security, and compliance for the Client’s environments
- Monitor and resolve site stability and performance issues related to functionality and availability
- Work closely with client DevOps and product teams to provide 24x7x365 support to environments through Client ticketing systems
- Support definition, testing, and validation of incident response and disaster recovery documentation and exercises
- Participate in on-call rotations as needed to support Client critical events, and operational needs that may lay outside of business hours
- Support testing and data reviews to collect and report on the effectiveness of current security and operational measures, in addition to remediating deviations from current security and operational measures
- Maintain detailed diagrams representative of the Client’s cloud architecture
- Maintain, optimize, and peer review standard operating procedures, operational runbooks, technical documents, and troubleshooting guidelines
View Full Description & ApplyYou'll be redirected to the employer's site