Apply

Senior Site Reliability Engineer

Posted 2024-11-16

View full description

💎 Seniority level: Senior, 4+ years

📍 Location: Netherlands

🔍 Industry: Creative Technology

🏢 Company: Creative Fabrica

⏳ Experience: 4+ years

🪄 Skills: AWSPHPPythonDynamoDBKafkaKubernetesTypeScriptGoDevOpsTerraformMicroservices

Requirements:
  • 4+ years operating and supporting a high-volume, high-performance, cloud-native distributed computing environment.
  • Proven experience with Terraform, containers, and monitoring solutions.
  • Experience with a wide array of AWS-based services (EC2, ECS/EKS, S3, RDS, ALB, MSK, DynamoDB, Redshift, etc).
  • Experience supporting and deploying applications and microservices written in Go, Python, and PHP.
  • Experience with driving DevOps practices and developing automation solutions in a continuous deployment environment.
  • Experience with Kubernetes and Kafka is highly preferred.
Responsibilities:
  • Improve our site infrastructure to keep up with the company’s fast growth and technology evolution.
  • Proactively monitor the infrastructure and propose improvements.
  • Lead the design and building of a fully automated, developer self-service platform.
  • Research, develop and implement infrastructure management standards across our cloud accounts (AWS).
  • Participate in pre-production and production site releases.
  • Participate in the on-call rotation and in the debugging of issues.
Apply

Related Jobs

Apply

📍 US, Europe

🧭 Full-Time

💸 175000 - 210000 USD per year

🔍 Cloud computing, AI

🏢 Company: CoreWeave

  • You have 5+ years of experience in the software or infrastructure engineering industry.
  • Experience with Python, Go or another scripting language.
  • Experience with how to containerize applications and/or have experience using Kubernetes to manage deployments.
  • Experience with Git.
  • Experience with Linux shell scripting and/or can navigate a *nix-based operating system.
  • Experience creating and maintaining GitHub Actions to automate workflows.
  • You have experience deploying services in production and are interested in learning reliability-at-scale engineering concepts.
  • You have experience refining SDLC, doing code reviews, and providing technical support.

  • Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
  • Streamline repetitive tasks and eliminate bottlenecks to improve development velocity with automated workflows and processes.
  • Partner with developers to understand their pain points and develop tailored solutions that enhance their productivity.
  • Champion best practices and advocate for new tools and technologies to drive ongoing productivity gains.
  • Tackle complex issues related to build systems, testing frameworks, code analysis, and other developer tooling.
  • Enable and evangelize the practice of reliability engineering across CoreWeave's engineering teams.

PythonSoftware DevelopmentGitKubernetes*NixGoCollaboration

Posted 2024-11-07
Apply
Apply

📍 Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

💸 109000 - 169000 USD per year

🔍 Nonprofit, Technology

  • Proficient at automation/programming/scripting skills.
  • Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.) as well as modern observability infrastructure (Prometheus, Grafana, Logstash/Kibana, Icinga/Nagios, etc.).
  • Advanced knowledge of Linux and IO/data storage concepts, internals and troubleshooting.
  • Experience with managing remotely both bare-metal servers and virtualized environments.
  • 5+ years experience in an SRE/Operations/DevOps role as part of a team.
  • Experience with high traffic and highly available website architectures and operations.
  • Strong English language skills.
  • Ability to work independently in a fast paced environment, as an effective part of a globally distributed team, including ticket tracking systems and asynchronous communication tools.
  • B.Sc. or M.Sc. in Computer Science or equivalent work experience.

  • Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments.
  • Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution.
  • Improving observability (alerting, metrics, monitoring) of database infrastructure.
  • Multi-datacenter systems design, capacity and infrastructure planning.
  • Taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia's production infrastructure and participating in an on call rotation.

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedis

Posted 2024-08-28
Apply
Apply

📍 Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

💸 109000 - 169000 USD per year

🔍 Nonprofit, knowledge sharing

  • Proficient at automation/programming/scripting skills
  • Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.)
  • Advanced knowledge of Linux and IO/data storage concepts
  • Experience with managing remotely both bare-metal servers and virtualized environments
  • 5+ years experience in an SRE/Operations/DevOps role
  • Experience with high traffic and highly available website architectures
  • Strong English language skills
  • Ability to work independently in a fast paced environment

  • Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments
  • Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution
  • Improving observability of database infrastructure
  • Designing multi-datacenter systems, capacity planning, and infrastructure planning
  • Participating in incident response and on-call rotation for system outages or alerts

SQLKibanaC (Programming language)CassandraGrafanaPrometheusRedisLinux

Posted 2024-08-28
Apply
Apply

📍 Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Arab Emirates, United Kingdom, United States of America, Uruguay

🧭 Full-Time

💸 109047 - 169455 USD per year

🔍 Nonprofit / Technology

  • At least two years experience in an SRE/Operations/DevOps role as part of a team.
  • Experience supporting high availability distributed production systems.
  • Experience with database administration and support.
  • Comfortable with configuration management and orchestration tools (e.g., Puppet, Ansible, Chef, SaltStack).
  • Knowledge of modern observability infrastructure (monitoring, metrics, and logging).
  • Proficient in shell and scripting languages such as Python, Go, Bash, Ruby.
  • Good understanding of Linux/Unix fundamentals and debugging skills.
  • Excellent written and verbal communication skills.
  • BS or MS degree in Computer Science or equivalent work experience.

  • The Deployment, configuration and maintenance of the distributed data systems that comprise our data and analytics platform.
  • Implement data quality monitoring that alerts the team of possible data issues.
  • Collaborate closely with the Fundraising team to integrate and use data from self-hosted and third-party sources.
  • Provide engineering support during high-traffic or critical campaigns.
  • Write and update internal documentation of systems and processes.
  • Ensure compliance with regulations like the Donor Privacy Policy, GDPR, and PCI DSS.
  • Create and manage users and permissions for data access control.
  • Advise on data input best practices and develop processes for data entry consistency.
  • Work closely with Fundraising Analytics to gather and prioritize data enhancement requests.

PythonBashRubyC (Programming language)Data engineeringGoCommunication SkillsCollaboration

Posted 2024-08-22
Apply