Staff Software Engineer - Grafana Databases, Managed Services

Spain time zones only, Spain time zonesFull-TimeStaff
Salary94025 - 112830 EUR per year
Apply NowOpens the employer's application page

Job Details

Experience
8+ years
Required Skills
AWSPostgreSQLGCPKafkaKubernetesSnowflakeAzureCassandraClickhouseGoLinuxTerraformHelmNetworking

Requirements

  • 8+ years of engineering experience
  • Meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles
  • Experience with high-throughput streaming systems (e.g., Kafka, Redpanda, WarpStream)
  • Experience with analytical or storage backends (e.g., Postgres, ClickHouse, Snowflake, Cassandra)
  • Strong Kubernetes experience in AWS, GCP, or Azure
  • Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet)
  • Experience leading or driving complex technical efforts
  • Ability to influence technical direction and align teams around reliability improvements
  • Strong understanding of distributed systems failure modes in multi-cloud environments
  • Proficiency in at least one systems-oriented language (Go preferred)
  • Working knowledge of Linux internals
  • Working knowledge of networking
  • Working knowledge of cloud storage
  • Working knowledge of performance/scaling behavior
  • Experience participating in blameless incident response
  • Experience writing high-quality post-incident reviews

Responsibilities

  • Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure
  • Diagnose and eliminate cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.)
  • Design safe upgrade and rollout strategies at scale
  • Improve observability, automation, and operational ergonomics
  • Partner closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
  • Work directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc.
  • Serve as a primary escalation point and on-call for relevant incidents
  • Own the relationship with all system vendors, including WarpStream Labs and others
  • Help define and evolve the technical direction for operating WarpStream and adjacent shared database systems at scale
  • Lead complex initiatives such as migrations, rollout improvements, and reliability investments
  • Establish best practices around SLOs, scaling limits, failure isolation, and change safety
  • Investigate and drive resolution of multi-layer incidents spanning storage, compute, networking, and control-plane dependencies
  • Identify systemic risks across 100+ clusters and contribute architectural improvements that reduce recurring issues
  • Improve systems toil and operational ergonomics with automation
  • Partner with database and platform teams to align on strategy and long-term scalability
  • Mentor and support engineers as the team matures
View Full Description & ApplyYou'll be redirected to the employer's site
94025 - 112830 EUR per year
Apply Now