Senior AI Systems Quality Engineer

New

Abacus InsightsHealthcare Technology

Remote USFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

7+ years of software engineering experience, primarily in backend or platform systems.
Proven experience designing and implementing AI testing automation in production environments.
Demonstrated ability to build custom validation, evaluation, or testing frameworks for complex, distributed systems.
Strong proficiency in Python and/or TypeScript within modern AI engineering stacks.
Hands-on experience with AI-powered systems, including LLM-based or agentic workflows and non-deterministic behavior.
Experience designing or contributing to AI testing at scale, including regression frameworks, long-tail evaluation, and large test coverage.
Deep understanding of CI/CD integration, including embedding automated tests and quality gates into deployment pipelines.
Solid understanding of AWS cloud-native architectures.
Track record of engineering for quality, reliability, governance, and safety as core system design principles.
Working knowledge of security, privacy, and operational risk in regulated or mission-critical environments.
Experience with AI testing methodologies, including evaluation of non-deterministic outputs, drift detection, bias/fairness testing, and robust regression strategies.
Proven ability to establish measurable trust thresholds for AI systems.
Experience working with domain experts to define correctness and real-world validation scenarios.

Build and ship production-grade, automated validation frameworks, test harnesses, and evaluation pipelines across the AI lifecycle (design → deploy).
Design and evolve an AI testing platform integrated with Databricks and MLflow, enabling repeatable testing, traceability, and auditability.
Create large-scale, scenario-based test suites (hundreds to thousands of cases) to validate agentic workflows end-to-end, including edge cases, long-tail scenarios, and failure modes.
Validate orchestration behavior (tool use, memory, decision logic) and stress-test non-deterministic system behavior before production.
Embed quality by design: define system contracts, guardrails, and safe-degradation patterns at key boundaries.
Define measurable quality signals for LLM systems (grounding, hallucinations, relevance, latency, cost) and integrate them into CI/CD pipelines as automated quality gates.
Ensure AI validation runs automatically on model, prompt, and code changes—enabling continuous quality enforcement.
Build reusable libraries and components so teams can adopt consistent AI quality practices quickly.
Own aspects of AI release readiness, including defining go/no-go criteria based on measurable quality thresholds.
Partner with AI, platform, security, and delivery teams to translate mission needs into clear quality criteria, tradeoffs, and confidence levels.

View Full Description & ApplyYou'll be redirected to the employer's site