Senior Database Reliability Engineer (DBRE) & Architect

New
C
CloudlinuxLinux Infrastructure
WorldwideFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSPostgreSQLPythonApache AirflowGCPJenkinsKafkaKubernetesMongoDBAzureClickhouseGoGrafanaRedisTerraformAnsibleGitLab

Requirements

  • Deep PostgreSQL expertise (5+ years), including MVCC internals, locking mechanics, Patroni, PgBouncer, and seamless major version upgrades under load.
  • ClickHouse mastery: experience operating large clusters, understanding ZooKeeper/ClickHouse Keeper, sharding, replication internals, and diagnosing performance issues at the data-part level.
  • Engineering Mindset (SRE/DevOps) with experience writing complex Terraform modules and Ansible roles.
  • Programming skills in Python or Go for automation.
  • Experience in Hybrid Environments, understanding differences between Bare Metal, Kubernetes, Cloud, and optimizing TCO/disk subsystem performance (NVMe, Network Storage).
  • Systems approach, understanding security (FIPS, Audit logs) and Disaster Recovery.
  • Openness to modern workflows and integrating AI into day-to-day operations.
  • Experience building an Internal Developer Platform (IDP) (Nice to Have).
  • Experience operating databases in Kubernetes (CloudNativePG, Altinity Operator) (Nice to Have).
  • Experience working in Cloud and Hosting providers on similar services (Nice to Have).

Responsibilities

  • Design and implement a self-service platform based on Terraform and Ansible for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) across heterogeneous environments (Bare Metal, OpenNebula, Kubernetes, Public Clouds).
  • Manage and scale exponentially growing ClickHouse analytics clusters (12+ clusters, tens of terabytes of data), addressing sharding, table engine optimization, and building reliable S3 backup pipelines.
  • Maintain and scale infrastructure for Apache Airflow and Redash, ensuring reliability of ETL pipelines and visualization tools.
  • Implement SRE practices in data management, replacing manual incident response with automated self-healing mechanisms and defining/implementing SLO/SLI for all databases.
  • Lead the migration process from legacy solutions to modern cloud patterns and participate in decisions regarding Kubernetes operators for stateful workloads.
  • Serve as the technical authority for product teams, helping them optimize data schemas and SQL queries for high-load systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now