Sr QA Engineer (AI Systems & Platform)

New

Peach PilotAI, Insurance

Remote — Latin America, US Eastern Timezone Overlap Required (5+ hours daily)ContractSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: Strong English communication skills, written and verbal.
Experience: 5+ years of QA engineering experience
Required Skills: PostgreSQLPythonAPI testingCI/CDLangChain

Requirements

5+ years of QA engineering experience, with meaningful time spent writing test code (not just managing test cases).
Hands-on experience testing LLM-powered applications you understand prompt sensitivity, output variance, and how eval pipelines catch regressions across model updates.
You write test code. Python is your primary tool.
Experience contributing to CI/CD-integrated test suites.
Comfortable testing complex API chains, async/streaming responses, and multi-service workflows.
Collaborative and self-directed you work well as part of a team, pair well with engineers, and move work forward without hand-holding.
Strong English communication skills, written and verbal.
Available during US Eastern business hours with a minimum of 5 hours of daily overlap.
Experience with LLM evaluation frameworks such as LangSmith, PromptFlow, or custom eval pipelines (Even Better If).
Experience testing agent frameworks (LangChain, CrewAI, or similar) and agent orchestration systems (Even Better If).
Experience testing graph databases (Memgraph, Neo4j) or vector stores (Qdrant) (Even Better If).
Background in enterprise software or regulated industries where audit trail integrity is non-negotiable (Even Better If).
Insurance industry background is a plus — it is our first vertical (Even Better If).

Responsibilities

Pair with full-stack and backend engineers on the features they are shipping — understand what they built, write tests that prove it works, and flag gaps early.
Reproduce and triage bugs with enough detail that an engineer can fix them without a round-trip.
Contribute to and help evolve our automated test suites (unit, integration, end-to-end) alongside the QA Lead.
Help build and run evaluation pipelines for non-deterministic LLM outputs, prompt regression, model drift detection, and output quality scoring across the LiteLLM routing layer.
Build and run automated tests for the agent orchestration layer, covering governance audit trail integrity, human-in-the-loop override behavior, and cross-agent handoffs.
Test retrieval quality and failure modes against the Company Brain (Memgraph, Neo4j, Qdrant, PostgreSQL) using real enterprise data scenarios.
Test the Nango-based integration layer across connectors and the file ingestion pipeline (Word, Excel, PowerPoint, PDF) including encryption, formatting edge cases, and audit trail continuity.
Validate streaming response handling, latency thresholds, and graceful degradation when a model is unavailable or slow.
Verify multi-model routing logic so cost-optimized task allocation behaves correctly across LLM providers, and outputs remain faithful regardless of which model served the request.
Test the trust-layer UX onboarding flows, progressive disclosure, uncertainty states, agent activity surfacing, and human-in-the-loop governance interfaces and help shape the standards as we go.

View Full Description & ApplyYou'll be redirected to the employer's site