Hands-on experience setting up and managing Apache Druid clusters, including upgrades, troubleshooting, and tuning. Deep knowledge of Druid operations: indexing, segment management, data partitioning, tiered storage, and query optimization. Proficiency in Java (JDK 8+) and Linux/Unix systems. Scripting/automation skills (Bash, Python) for deployments, maintenance, and performance tuning. Solid understanding of distributed systems concepts such as replication, failover, consensus protocols (Zookeeper), and multi-region deployment strategies. Familiarity with streaming ingestion (Kafka/MSK, Flink CDC), cloud storage (S3), and related data infrastructure. Experience establishing and operating to SLAs/SLOs, and defining standards for real-time analytics workloads. Experience with observability and incident management: metrics collection, dashboards, and alerting (Datadog, Prometheus, Grafana, Cloudwatch or equivalent). Knowledge of security and governance practices including authentication, role-based access, encryption, and audit logging. Demonstrated technical leadership: mentoring engineers, driving architecture discussions, and collaborating with teams.