Principal QA Engineer - AI Systems & Platform

Remote — Latin America, US Eastern Timezone Overlap Required (5+ hours daily)ContractPrincipal
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
7+ years of QA engineering experience with at least 3 years in a lead or senior role
Required Skills
Node.jsPythonExpress.jsSnowflakeTypeScriptAPI testingFastAPINext.jsReactCI/CD

Requirements

  • 7+ years of QA engineering experience
  • 3+ years in a lead or senior QA role
  • Hands-on experience testing LLM-powered applications
  • Understand prompt sensitivity, output variance, and how to build eval pipelines
  • Proficiency in Python for writing test code
  • Experience building and maintaining CI/CD-integrated test suites
  • Comfortable testing complex API chains, async/streaming responses, and multi-service workflows
  • Built or significantly improved a QA function in an early-stage environment
  • Strong English communication skills (written and verbal)
  • Available during US Eastern business hours with minimum 5 hours of daily overlap
  • Experience with LLM evaluation frameworks (LangSmith, PromptFlow, custom eval pipelines) is a plus
  • Experience testing agent frameworks (LangChain, CrewAI) is a plus
  • Background in enterprise software or regulated industries is a plus
  • Insurance industry background is a strong plus

Responsibilities

  • Build and own the QA function at Peach Pilot
  • Write test code, design eval pipelines, and set the quality bar
  • Establish the testing framework from zero (unit, integration, end-to-end, LLM-specific evaluation pipelines)
  • Define quality standards, test coverage requirements, and documentation practices
  • Audit the existing platform and identify highest-risk surfaces
  • Own the QA function end to end and be the voice of quality across the engineering team
  • Design evaluation frameworks for non-deterministic LLM outputs (prompt regression testing, model drift detection, output quality scoring)
  • Build automated test suites for the agent orchestration layer
  • Validate the Enterprise Knowledge Graph for data accuracy, retrieval quality, and failure modes
  • Own end-to-end testing of the file ingestion pipeline across document types
  • Validate streaming response handling, latency thresholds, and graceful degradation
  • Test multi-model routing logic for cost-optimized task allocation
  • Partner with the Full-Stack Engineer to define and test trust-layer UX standards
  • Act as the internal advocate for the non-technical enterprise user
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now