Database Reliability Engineer

T
TucowsSaaS, Telecoms
CanadaFull-TimeMiddle
Salary126100 - 140100 CAD per year
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
PostgreSQLPythonSQLBashGoGrafanaPrometheusLinuxTerraformAnsibleDatadog

Requirements

  • 7+ years of hands-on PostgreSQL experience in large-scale, high-volume production environments
  • Strong expertise in PostgreSQL internals: WAL, MVCC, vacuum tuning, query planner, indexing, logical replication
  • Advanced SQL and strong schema design and query optimization skills
  • Solid experience with Linux systems and networking fundamentals
  • Experience building automation using Go or Python
  • Experience with monitoring tools such as Prometheus, Grafana, Datadog, PMM, pg_stat_statements
  • Deep understanding of PostgreSQL internals: MVCC, WAL processing, vacuum behavior, locking, query planning
  • Experience designing and operating highly available database clusters with automated failover
  • Strong performance tuning skills (query optimization, indexing, workload tuning)
  • Ability to diagnose database and system issues: Query plans, I/O, memory usage, WAL growth, table/index bloat
  • Experience with backup and recovery strategies: Point-in-time recovery (PITR), durability planning
  • Familiarity with observability and monitoring: Metrics, alerting, and performance dashboards (Grafana)
  • Understanding of distributed systems concepts: Service discovery, consensus (e.g., Consul)
  • Strong Linux systems knowledge (performance tuning, resource management)
  • Experience with scripting and infrastructure-as-code automation
  • Strong troubleshooting and problem-solving skills in production environments
  • Knowledge of Security, compliance, encryption, auditing, access control

Responsibilities

  • Design, implement, and operate highly available PostgreSQL clusters (physical/logical replication, sharding, partitioning, failover automation)
  • Optimize query performance and indexing strategies
  • Perform capacity planning, growth forecasting, and workload modeling
  • Own high-availability strategies, including automatic failover, multi-region deployments, disaster recovery
  • Build and maintain automation for provisioning, configuration, backups, recovery, failovers, vacuum tuning, schema management
  • Develop monitoring and alerting systems for PostgreSQL clusters
  • Lead response during database incidents (e.g., performance regressions, replication lag, deadlocks, bloat, storage failures)
  • Conduct root-cause analysis and implement long-term fixes
  • Partner with software engineers to review SQL queries, optimize schemas, and ensure effective use of PostgreSQL features
  • Provide guidance on database design patterns, migrations and version upgrades, best practices
View Full Description & ApplyYou'll be redirected to the employer's site
126100 - 140100 CAD per year
Apply Now