Principal QA Engineer - AI Systems & Platform
H
Hive Financial SystemsEnterprise AI
Remote — Latin America, US Eastern Timezone Overlap Required (5+ hours daily)ContractPrincipal
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 7+ years of QA engineering experience with at least 3 years in a lead or senior role
- Required Skills
- Node.jsPythonExpress.jsSnowflakeTypeScriptAPI testingFastAPINext.jsReactCI/CD
Requirements
- 7+ years of QA engineering experience
- 3+ years in a lead or senior QA role
- Hands-on experience testing LLM-powered applications
- Understand prompt sensitivity, output variance, and how to build eval pipelines
- Proficiency in Python for writing test code
- Experience building and maintaining CI/CD-integrated test suites
- Comfortable testing complex API chains, async/streaming responses, and multi-service workflows
- Built or significantly improved a QA function in an early-stage environment
- Strong English communication skills (written and verbal)
- Available during US Eastern business hours with minimum 5 hours of daily overlap
- Experience with LLM evaluation frameworks (LangSmith, PromptFlow, custom eval pipelines) is a plus
- Experience testing agent frameworks (LangChain, CrewAI) is a plus
- Background in enterprise software or regulated industries is a plus
- Insurance industry background is a strong plus
Responsibilities
- Build and own the QA function at Peach Pilot
- Write test code, design eval pipelines, and set the quality bar
- Establish the testing framework from zero (unit, integration, end-to-end, LLM-specific evaluation pipelines)
- Define quality standards, test coverage requirements, and documentation practices
- Audit the existing platform and identify highest-risk surfaces
- Own the QA function end to end and be the voice of quality across the engineering team
- Design evaluation frameworks for non-deterministic LLM outputs (prompt regression testing, model drift detection, output quality scoring)
- Build automated test suites for the agent orchestration layer
- Validate the Enterprise Knowledge Graph for data accuracy, retrieval quality, and failure modes
- Own end-to-end testing of the file ingestion pipeline across document types
- Validate streaming response handling, latency thresholds, and graceful degradation
- Test multi-model routing logic for cost-optimized task allocation
- Partner with the Full-Stack Engineer to define and test trust-layer UX standards
- Act as the internal advocate for the non-technical enterprise user
View Full Description & ApplyYou'll be redirected to the employer's site