Strong experience building and operating distributed, real-time backend systems (including C++ and Go services). Deep understanding of networked, message-driven architectures (TCP/UDP, connection management, backpressure, timeouts, heartbeats, long-lived connections). Proven track record designing and implementing high-availability and failover patterns. Ability to design state replication and recovery mechanisms. Expertise in idempotent, restart-safe operations and APIs. Strong background in observability and diagnostics: logging, metrics, tracing, SLO definition. Experience with configuration-driven systems, deployment automation, and infrastructure as code (Kubernetes, Kustomize/Helm/Ansible or equivalent). Hands-on experience with automated testing for distributed systems, including integration, scenario-based, stress, fault-injection/chaos, and long-running soak tests. Safety-critical mindset and comfort working in a requirements-driven environment. Strong ownership and collaboration skills.