Senior Staff Engineer, Online Datastores

Posted 2 months agoViewed

176500 - 288500 USD per year

United States, Canada (BC & ON)Full-TimeSoftware Development

Company:Webflow

Location:United States, Canada (BC & ON), EST, PST

Languages:English

Seniority level:Staff, 5+ years

Experience:5+ years

Skills:

LeadershipPostgreSQLPythonBashJavaKafkaMongoDBMentoringLinux

Requirements:

Hands-on experience setting up and managing Apache Druid clusters, including upgrades, troubleshooting, and tuning. Deep knowledge of Druid operations: indexing, segment management, data partitioning, tiered storage, and query optimization. Proficiency in Java (JDK 8+) and Linux/Unix systems. Scripting/automation skills (Bash, Python) for deployments, maintenance, and performance tuning. Solid understanding of distributed systems concepts such as replication, failover, consensus protocols (Zookeeper), and multi-region deployment strategies. Familiarity with streaming ingestion (Kafka/MSK, Flink CDC), cloud storage (S3), and related data infrastructure. Experience establishing and operating to SLAs/SLOs, and defining standards for real-time analytics workloads. Experience with observability and incident management: metrics collection, dashboards, and alerting (Datadog, Prometheus, Grafana, Cloudwatch or equivalent). Knowledge of security and governance practices including authentication, role-based access, encryption, and audit logging. Demonstrated technical leadership: mentoring engineers, driving architecture discussions, and collaborating with teams.

Responsibilities:

Ensure high reliability, uptime, and query performance for Apache Druid clusters on EC2. Lead monitoring, alerting, troubleshooting, and incident response using observability tools. Manage data lifecycle: tiering strategies, retention policies, deep storage, and metadata store. Operate and tune Zookeeper clusters. Optimize performance and cost efficiency by right-sizing clusters and balancing ingestion with query workloads. Define and drive standards for online datastore operations, including multi-region deployments and failover strategies. Provide technical leadership and mentorship, collaborating with teams to define governance and strategy.