Staff Software Engineer

I
InfinityAI Infrastructure
Overlap with Americas timezones for collaboration, Americas timezonesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
10+ years
Required Skills
PostgreSQLPythonDjangoGCPKubernetesDatadogHelmDistributed Systems

Requirements

  • 10+ years building and operating production backend systems at scale
  • Deep expertise in Python (Django preferred) and relational databases (PostgreSQL)
  • Hands-on experience with Kubernetes, Helm, and cloud infrastructure (GCP preferred)
  • Strong background in distributed systems: message queues, event sourcing, workflow orchestration
  • Production experience with async task systems (Celery, Dramatiq, or similar)
  • Track record of debugging complex production issues across multiple services
  • Ability to work autonomously and drive technical initiatives
  • Clear technical communication—able to explain tradeoffs and build consensus

Responsibilities

  • Drive platform architecture decisions and align the team on scalable patterns and long-term maintainability
  • Review a high volume of code, design docs, and architectural proposals for scalability, reliability, security, and operability
  • Be a technical mentor and force multiplier: unblock engineers, raise the bar on production readiness, and establish platform best practices
  • Own and evolve the core backend platform (Django/DRF/ASGI) performance and correctness
  • Scale async execution across Celery + Dramatiq + Temporal/Cortex; implement resilient workflow patterns
  • Optimize PostgreSQL/pgvector and caching strategies
  • Maintain and improve Kubernetes deployment infrastructure (GKE, Helm, Terraform/OpenTofu) and CI/CD
  • Own reliability of RabbitMQ, Redis, and PostgreSQL infrastructure; lead incident response and post-mortems
  • Extend OpenTelemetry + Datadog instrumentation, dashboards, alerts, and SLOs
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now