Principal QA Engineer - AI Systems & Platform

Remote — Latin America, US Eastern Timezone Overlap Required (5+ hours daily)ContractPrincipal

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English
Experience: 7+ years of QA engineering experience with at least 3 years in a lead or senior role
Required Skills: Node.jsPythonExpress.jsSnowflakeTypeScriptAPI testingFastAPINext.jsReactCI/CD

7+ years of QA engineering experience
3+ years in a lead or senior QA role
Hands-on experience testing LLM-powered applications
Understand prompt sensitivity, output variance, and how to build eval pipelines
Proficiency in Python for writing test code
Experience building and maintaining CI/CD-integrated test suites
Comfortable testing complex API chains, async/streaming responses, and multi-service workflows
Built or significantly improved a QA function in an early-stage environment
Strong English communication skills (written and verbal)
Available during US Eastern business hours with minimum 5 hours of daily overlap
Experience with LLM evaluation frameworks (LangSmith, PromptFlow, custom eval pipelines) is a plus
Experience testing agent frameworks (LangChain, CrewAI) is a plus
Background in enterprise software or regulated industries is a plus
Insurance industry background is a strong plus

Build and own the QA function at Peach Pilot
Write test code, design eval pipelines, and set the quality bar
Establish the testing framework from zero (unit, integration, end-to-end, LLM-specific evaluation pipelines)
Define quality standards, test coverage requirements, and documentation practices
Audit the existing platform and identify highest-risk surfaces
Own the QA function end to end and be the voice of quality across the engineering team
Design evaluation frameworks for non-deterministic LLM outputs (prompt regression testing, model drift detection, output quality scoring)
Build automated test suites for the agent orchestration layer
Validate the Enterprise Knowledge Graph for data accuracy, retrieval quality, and failure modes
Own end-to-end testing of the file ingestion pipeline across document types
Validate streaming response handling, latency thresholds, and graceful degradation
Test multi-model routing logic for cost-optimized task allocation
Partner with the Full-Stack Engineer to define and test trust-layer UX standards
Act as the internal advocate for the non-technical enterprise user

View Full Description & ApplyYou'll be redirected to the employer's site