Senior DevOps Engineer (HPC)
New
Opportunity to work remotely within PolandFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English (B2+)
- Experience
- 3+ years
- Required Skills
- PythonBashGrafanaPrometheusLinuxTerraformAnsible
Requirements
- 3+ years of experience with DevOps processes and automation using Infrastructure as Code tools such as Terraform
- Hands-on experience operating or engineering large-scale HPC or similar computing environments
- Proven expertise in Linux system administration including TCP/IP networking and storage subsystems
- Experience administering large-scale cluster management software such as Slurm, LSF, or Grid Engine
- Knowledge of configuration management tools like Ansible, Salt, or Puppet
- Experience working in agile DevOps teams
- Ability to develop and maintain monitoring tools such as Grafana and Prometheus
- Experience with scripting languages such as Bash and Python for automation and tool development
- Strong experience managing virtualized private cloud environments like OpenStack
- Scientific degree or equivalent experience in computationally intensive scientific data analysis
- Proven ability to manage relationships with third-party suppliers
- Upper-intermediate proficiency in English (B2+)
Responsibilities
- Design, implement, and maintain robust platform infrastructure using Infrastructure as Code tools such as Terraform
- Develop, deliver, and operate research computing services and applications
- Apply Site Reliability Engineering principles to manage HPC service deployment, monitoring, and incident response
- Solve complex technical problems related to HPC services and user applications
- Manage large-scale HPC, HTC, or BC computing environments for optimal performance
- Collaborate with scientific users to tailor HPC resources to research needs
- Automate deployment processes to ensure consistency across HPC infrastructure
- Maintain and administer large-scale cluster and server computing software such as Slurm, LSF, or Grid Engine
- Develop and maintain monitoring dashboards using tools like Grafana and Prometheus
- Work within a DevOps team environment following agile methodologies
View Full Description & ApplyYou'll be redirected to the employer's site