Sr QA Engineer (AI Systems & Platform)

New
P
Peach PilotAI, Insurance
Remote — Latin America, US Eastern Timezone Overlap Required (5+ hours daily)ContractSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
Strong English communication skills, written and verbal.
Experience
5+ years of QA engineering experience
Required Skills
PostgreSQLPythonAPI testingCI/CDLangChain

Requirements

  • 5+ years of QA engineering experience, with meaningful time spent writing test code (not just managing test cases).
  • Hands-on experience testing LLM-powered applications you understand prompt sensitivity, output variance, and how eval pipelines catch regressions across model updates.
  • You write test code. Python is your primary tool.
  • Experience contributing to CI/CD-integrated test suites.
  • Comfortable testing complex API chains, async/streaming responses, and multi-service workflows.
  • Collaborative and self-directed you work well as part of a team, pair well with engineers, and move work forward without hand-holding.
  • Strong English communication skills, written and verbal.
  • Available during US Eastern business hours with a minimum of 5 hours of daily overlap.
  • Experience with LLM evaluation frameworks such as LangSmith, PromptFlow, or custom eval pipelines (Even Better If).
  • Experience testing agent frameworks (LangChain, CrewAI, or similar) and agent orchestration systems (Even Better If).
  • Experience testing graph databases (Memgraph, Neo4j) or vector stores (Qdrant) (Even Better If).
  • Background in enterprise software or regulated industries where audit trail integrity is non-negotiable (Even Better If).
  • Insurance industry background is a plus — it is our first vertical (Even Better If).

Responsibilities

  • Pair with full-stack and backend engineers on the features they are shipping — understand what they built, write tests that prove it works, and flag gaps early.
  • Reproduce and triage bugs with enough detail that an engineer can fix them without a round-trip.
  • Contribute to and help evolve our automated test suites (unit, integration, end-to-end) alongside the QA Lead.
  • Help build and run evaluation pipelines for non-deterministic LLM outputs, prompt regression, model drift detection, and output quality scoring across the LiteLLM routing layer.
  • Build and run automated tests for the agent orchestration layer, covering governance audit trail integrity, human-in-the-loop override behavior, and cross-agent handoffs.
  • Test retrieval quality and failure modes against the Company Brain (Memgraph, Neo4j, Qdrant, PostgreSQL) using real enterprise data scenarios.
  • Test the Nango-based integration layer across connectors and the file ingestion pipeline (Word, Excel, PowerPoint, PDF) including encryption, formatting edge cases, and audit trail continuity.
  • Validate streaming response handling, latency thresholds, and graceful degradation when a model is unavailable or slow.
  • Verify multi-model routing logic so cost-optimized task allocation behaves correctly across LLM providers, and outputs remain faithful regardless of which model served the request.
  • Test the trust-layer UX onboarding flows, progressive disclosure, uncertainty states, agent activity surfacing, and human-in-the-loop governance interfaces and help shape the standards as we go.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now