- Design and operate high-throughput, data-intensive ingestion and trace-query systems.
- Build monitoring, alerting, and automated recovery for system resilience.
- Define and enforce standards, tooling, and CI for SDK generation across Python, TypeScript, Go, and Java.
- Build and maintain integrations to ensure framework-agnostic usage.
- Debug performance bottlenecks and optimize database queries.
- Architect solutions for distributed-system challenges.
- Participate in an on-call rotation with a focus on post-incident learning and prevention.
AWSPostgreSQLPython+6 more