Apply

System Reliability Engineer - Contract

Posted 2024-10-16

View full description

📍 Location: Mexico

🔍 Industry: Consulting

🏢 Company: Tech Holding

🪄 Skills: AWSPythonBashCloud ComputingGit*NixAmazon Web Services

Requirements:
  • Proficiency in managing and troubleshooting Linux (e.g., Amazon Linux, CentOS) and Windows Server operating systems.
  • Experience with system configuration, management, and maintenance.
  • Experience with automation tools such as Ansible, Puppet, or Chef.
  • Familiarity with monitoring solutions such as AWS CloudWatch, Dynatrace, Datadog or similar solutions.
  • Ability to analyze system performance metrics and implement optimizations.
  • Experience with patch management, vulnerability assessment, and remediation.
  • Proficiency in scripting languages such as Bash, Python, or PowerShell for automating administrative tasks.
  • Experience with version control systems like Git.
  • Familiarity with AWS, specifically in managing EC2 instances, lambdas and containers.
  • Familiarity in incident response, troubleshooting, and performing root cause analysis.
  • Familiarity with infrastructure as code (IaC) tools like Terraform or AWS CloudFormation.
Responsibilities:
  • Manage, configure, and maintain Linux and Windows Server environments.
  • Perform regular system updates, patches, and security configurations.
  • Implement and maintain monitoring tools to track system performance, availability, and reliability.
  • Analyze performance metrics and logs to identify and resolve issues proactively.
  • Collaborate with stakeholders to create dashboards and alerts for proactive performance monitoring.
  • Develop and maintain automation scripts for routine tasks, deployments, and incident responses.
  • Use configuration management tools to ensure consistent and repeatable system setups.
  • Implement and enforce security best practices for system configurations and network setups.
  • Conduct regular vulnerability assessments and apply necessary patches to mitigate risks.
  • Work closely with development, DevSecOps, and cloud engineering teams to support application deployments and infrastructure changes.
  • Provide technical guidance and support for resolving complex system issues.
  • Create and maintain detailed documentation for system configurations, procedures, and incident reports.
  • Identify opportunities for process improvements and implement changes to enhance system reliability and performance.
Apply