Staff Software Engineer
I
InfinityAI Infrastructure
Overlap with Americas timezones for collaboration, Americas timezonesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 10+ years
- Required Skills
- PostgreSQLPythonDjangoGCPKubernetesDatadogHelmDistributed Systems
Requirements
- 10+ years building and operating production backend systems at scale
- Deep expertise in Python (Django preferred) and relational databases (PostgreSQL)
- Hands-on experience with Kubernetes, Helm, and cloud infrastructure (GCP preferred)
- Strong background in distributed systems: message queues, event sourcing, workflow orchestration
- Production experience with async task systems (Celery, Dramatiq, or similar)
- Track record of debugging complex production issues across multiple services
- Ability to work autonomously and drive technical initiatives
- Clear technical communication—able to explain tradeoffs and build consensus
Responsibilities
- Drive platform architecture decisions and align the team on scalable patterns and long-term maintainability
- Review a high volume of code, design docs, and architectural proposals for scalability, reliability, security, and operability
- Be a technical mentor and force multiplier: unblock engineers, raise the bar on production readiness, and establish platform best practices
- Own and evolve the core backend platform (Django/DRF/ASGI) performance and correctness
- Scale async execution across Celery + Dramatiq + Temporal/Cortex; implement resilient workflow patterns
- Optimize PostgreSQL/pgvector and caching strategies
- Maintain and improve Kubernetes deployment infrastructure (GKE, Helm, Terraform/OpenTofu) and CI/CD
- Own reliability of RabbitMQ, Redis, and PostgreSQL infrastructure; lead incident response and post-mortems
- Extend OpenTelemetry + Datadog instrumentation, dashboards, alerts, and SLOs
View Full Description & ApplyYou'll be redirected to the employer's site