Apply

Senior Site Reliability Engineer - Data (REMOTE)

Posted about 1 month agoViewed

View full description

💎 Seniority level: Senior, 5+ years

📍 Location: OR, WA, CA, CO, TX, IL

💸 Salary: 130000.0 - 140000.0 USD per year

🔍 Industry: Software Development

🏢 Company: Discogs👥 51-100💰 $2,500,000 over 7 years agoDatabaseCommunitiesMusic

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: AWSPythonElasticSearchGitKafkaKubernetesMySQLFastAPIRDBMSREST APICI/CDMentoringTerraformDocumentationTroubleshootingScripting

Requirements:
  • 5+ years of experience working with Kafka and relational database management systems (RDBMS)
  • Relational database schema design, query performance optimization, administration (MySQL, Percona Server, AWS RDS)
  • Kafka: Cluster administration (Strimzi), Kafka Connect (Debezium, JDBC)
  • CI/CD (GitHub Actions)
  • GitOps (ArgoCD)
  • Kubernetes (EKS, Kustomize, Karpenter, administration, application manifests)
  • AWS and cloud development (VPC, EKS, RDS, S3)
  • Observability (Datadog, Sentry)
  • Scripting (Shell, Python)
Responsibilities:
  • Stewarding Discogs’ data stores as a key subject matter expert
  • Leading efforts on the reliability and design patterns of our Kafka and Kafka Connect implementations
  • Establishing data contracts and clear communication standards between CDC producers and consumers
  • Working closely with engineering squads to refactor and re-architect MySQL database schema and indexing for long-term scalability, performance, and cost effectiveness
  • Mentoring engineering squads on Platform best practices for MySQL, Kafka, and other software development lifecycle areas
  • Writing documentation and runbooks that contribute to the engineering organization’s knowledge base
  • Working in a containerized, orchestrated environment
  • Contributing to the Platform team’s disciplines of site reliability and operations, supporting both our squads and Platform’s central infrastructure
  • Participating in on-call rotation, responding to incidents, and troubleshooting data and other operations issues
Apply