- Design and implement a self-service platform based on Terraform and Ansible for deploying HA clusters (PostgreSQL, ClickHouse, MongoDB, Redis) across heterogeneous environments (Bare Metal, OpenNebula, Kubernetes, Public Clouds).
- Manage and scale exponentially growing ClickHouse analytics clusters (12+ clusters, tens of terabytes of data), addressing sharding, table engine optimization, and building reliable S3 backup pipelines.
- Maintain and scale infrastructure for Apache Airflow and Redash, ensuring reliability of ETL pipelines and visualization tools.
- Implement SRE practices in data management, replacing manual incident response with automated self-healing mechanisms and defining/implementing SLO/SLI for all databases.
- Lead the migration process from legacy solutions to modern cloud patterns and participate in decisions regarding Kubernetes operators for stateful workloads.
- Serve as the technical authority for product teams, helping them optimize data schemas and SQL queries for high-load systems.
AWSPostgreSQLPython+14 more