Senior AI Systems Quality Engineer

New
A
Abacus InsightsHealthcare Technology
Remote USFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
AWSPythonMLFlowTypeScriptCI/CDDatabricks

Requirements

  • 7+ years of software engineering experience, primarily in backend or platform systems.
  • Proven experience designing and implementing AI testing automation in production environments.
  • Demonstrated ability to build custom validation, evaluation, or testing frameworks for complex, distributed systems.
  • Strong proficiency in Python and/or TypeScript within modern AI engineering stacks.
  • Hands-on experience with AI-powered systems, including LLM-based or agentic workflows and non-deterministic behavior.
  • Experience designing or contributing to AI testing at scale, including regression frameworks, long-tail evaluation, and large test coverage.
  • Deep understanding of CI/CD integration, including embedding automated tests and quality gates into deployment pipelines.
  • Solid understanding of AWS cloud-native architectures.
  • Track record of engineering for quality, reliability, governance, and safety as core system design principles.
  • Working knowledge of security, privacy, and operational risk in regulated or mission-critical environments.
  • Experience with AI testing methodologies, including evaluation of non-deterministic outputs, drift detection, bias/fairness testing, and robust regression strategies.
  • Proven ability to establish measurable trust thresholds for AI systems.
  • Experience working with domain experts to define correctness and real-world validation scenarios.

Responsibilities

  • Build and ship production-grade, automated validation frameworks, test harnesses, and evaluation pipelines across the AI lifecycle (design → deploy).
  • Design and evolve an AI testing platform integrated with Databricks and MLflow, enabling repeatable testing, traceability, and auditability.
  • Create large-scale, scenario-based test suites (hundreds to thousands of cases) to validate agentic workflows end-to-end, including edge cases, long-tail scenarios, and failure modes.
  • Validate orchestration behavior (tool use, memory, decision logic) and stress-test non-deterministic system behavior before production.
  • Embed quality by design: define system contracts, guardrails, and safe-degradation patterns at key boundaries.
  • Define measurable quality signals for LLM systems (grounding, hallucinations, relevance, latency, cost) and integrate them into CI/CD pipelines as automated quality gates.
  • Ensure AI validation runs automatically on model, prompt, and code changes—enabling continuous quality enforcement.
  • Build reusable libraries and components so teams can adopt consistent AI quality practices quickly.
  • Own aspects of AI release readiness, including defining go/no-go criteria based on measurable quality thresholds.
  • Partner with AI, platform, security, and delivery teams to translate mission needs into clear quality criteria, tradeoffs, and confidence levels.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now