Apply

Sr. Site Reliability Engineer (Remote, Mexico)

Posted 3 days agoViewed

View full description

Requirements:
  • Bachelor’s degree (or equivalent) in computer science or related discipline
  • Knowledge of IaC technologies such as Terraform, Ansible, Puppet, Chef.
  • Knowledge of Cluster creation and management through Kubernetes
  • Knowledge of Microsoft Azure, AWS, Google Cloud, Azure services, Virtual Machine in Azure, Virtual Network Configuration.
  • Knowledge in design patterns such as: Iaas, Paas, and Saas
  • Knowledge in CI/CD
  • Scripting knowledge with PowerShell
  • IPs and Mask knowledge
  • Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)
Responsibilities:
  • Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.
  • Design and enhance software architecture to improve scalability, service reliability, capacity, and performance.
  • Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.
  • Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production.
  • Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.
  • Write postmortem reviews and remediation recommendation.
  • Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditions
  • Author and update high-quality documentation of all relevant specifications, systems and procedures
  • Support and comply with the company’s Quality Management System policies and procedures.
Apply

Related Articles

Posted 6 months ago

Insights into the evolving landscape of remote work in 2024 reveal the importance of certifications and continuous learning. This article breaks down emerging trends, sought-after certifications, and provides practical solutions for enhancing your employability and expertise. What skills will be essential for remote job seekers, and how can you navigate this dynamic market to secure your dream role?

Posted 6 months ago

Explore the challenges and strategies of maintaining work-life balance while working remotely. Learn about unique aspects of remote work, associated challenges, historical context, and effective strategies to separate work and personal life.

Posted 6 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 6 months ago

Learn about the importance of pre-onboarding preparation for remote employees, including checklist creation, documentation, tools and equipment setup, communication plans, and feedback strategies. Discover how proactive pre-onboarding can enhance job performance, increase retention rates, and foster a sense of belonging from day one.

Posted 6 months ago

The article explores the current statistics for remote work in 2024, covering the percentage of the global workforce working remotely, growth trends, popular industries and job roles, geographic distribution of remote workers, demographic trends, work models comparison, job satisfaction, and productivity insights.