Database Reliability Engineer
T
TucowsSaaS, Telecoms
CanadaFull-TimeMiddle
Salary126100 - 140100 CAD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years
- Required Skills
- PostgreSQLPythonSQLBashGoGrafanaPrometheusLinuxTerraformAnsibleDatadog
Requirements
- 7+ years of hands-on PostgreSQL experience in large-scale, high-volume production environments
- Strong expertise in PostgreSQL internals: WAL, MVCC, vacuum tuning, query planner, indexing, logical replication
- Advanced SQL and strong schema design and query optimization skills
- Solid experience with Linux systems and networking fundamentals
- Experience building automation using Go or Python
- Experience with monitoring tools such as Prometheus, Grafana, Datadog, PMM, pg_stat_statements
- Deep understanding of PostgreSQL internals: MVCC, WAL processing, vacuum behavior, locking, query planning
- Experience designing and operating highly available database clusters with automated failover
- Strong performance tuning skills (query optimization, indexing, workload tuning)
- Ability to diagnose database and system issues: Query plans, I/O, memory usage, WAL growth, table/index bloat
- Experience with backup and recovery strategies: Point-in-time recovery (PITR), durability planning
- Familiarity with observability and monitoring: Metrics, alerting, and performance dashboards (Grafana)
- Understanding of distributed systems concepts: Service discovery, consensus (e.g., Consul)
- Strong Linux systems knowledge (performance tuning, resource management)
- Experience with scripting and infrastructure-as-code automation
- Strong troubleshooting and problem-solving skills in production environments
- Knowledge of Security, compliance, encryption, auditing, access control
Responsibilities
- Design, implement, and operate highly available PostgreSQL clusters (physical/logical replication, sharding, partitioning, failover automation)
- Optimize query performance and indexing strategies
- Perform capacity planning, growth forecasting, and workload modeling
- Own high-availability strategies, including automatic failover, multi-region deployments, disaster recovery
- Build and maintain automation for provisioning, configuration, backups, recovery, failovers, vacuum tuning, schema management
- Develop monitoring and alerting systems for PostgreSQL clusters
- Lead response during database incidents (e.g., performance regressions, replication lag, deadlocks, bloat, storage failures)
- Conduct root-cause analysis and implement long-term fixes
- Partner with software engineers to review SQL queries, optimize schemas, and ensure effective use of PostgreSQL features
- Provide guidance on database design patterns, migrations and version upgrades, best practices
View Full Description & ApplyYou'll be redirected to the employer's site