Product QA Engineer - Gen AI

Posted 5 days agoViewed

18000 - 24000 USD per year

Work from anywhereFull-TimeGen AI

Company:

Location:Work from anywhere

Languages:English

Seniority level:Senior

Skills:

LeadershipPythonSoftware DevelopmentAgileArtificial IntelligenceCloud ComputingMachine LearningProduct ManagementQAQA AutomationUser Experience DesignProduct DevelopmentAPI testingManual testingRegression testingTestRailCI/CDAgile methodologiesDevOpsMicroservicesSoftware Engineering

Requirements:

Experience testing GenAI or LLM-driven products, including common failure modes such as hallucinations, unsafe responses, bias, and brittle decision paths. Exposure to performance and load testing tools and practices for web applications and APIs. Familiarity with structured exploratory testing approaches and test charters, especially for AI behavior and agent decision-making. Prior experience in high-velocity environments (e.g., startups) where QA acts as an owner of quality rather than a purely executional function. Prefer automation over repetition, while recognizing the value of focused exploratory testing

Responsibilities:

Own the full QA lifecycle for Agentic AI products: strategy, design, execution, reporting, and release sign-off. Design and run test plans covering functional, regression, smoke, exploratory, and usability testing for AI behavior and decision chains. Validate multi-step decision flows and reasoning to catch logic gaps, guardrail failures, or requirement mismatches. Perform structured exploratory testing to uncover unexpected behaviors, edge cases, and cascading AI failures. Build synthetic test scripts for UI elements, APIs, and end-to-end flows to verify functionality. Test across platforms (web, mobile, integrations) for consistency and performance. Maintain dashboards tracking test coverage, failures, and quality KPIs for all stakeholders. Improve test reliability: fix flakiness, optimize parallel runs, and cut execution time. Partner with Product, Design, and Engineering to refine requirements and set clear go/no-go criteria. Monitor pre- and post-release quality; use data to enhance AI evaluation and guardrails.