Senior QA Engineer (AI Agent Quality & Evaluation)
About the position
As a Senior QA Engineer, you’ll own quality strategy for AI-powered systems where correctness is probabilistic, outputs are structured (JSON), and evaluation requires real measurement (accuracy, cost, latency, edge-case handling, regression detection). You’ll build automated evaluation harnesses, and partner closely with Engineering and Product to prevent silent quality regressions as the system evolves. High autonomy, high leverage, and direct impact on the core product.
Responsibilities
Build automated evaluation harnesses; design and implement metrics pipelines and regression suites; partner closely with Engineering and Product to prevent silent quality regressions as the system evolves; ship quality gates and integrate evaluation into development workflows; detect and measure accuracy, cost, latency, and edge-case failures.
Requirements
Senior QA engineer who has moved beyond manual testing into automation, tooling, and quality systems; comfortable testing systems where “expected output” is not always deterministic and creating evaluation strategies for such systems; strong Python and data mindset to build repeatable harnesses, metrics pipelines, and regression suites; product-minded and skeptical; comfortable collaborating with engineers and shipping quality gates, not just filing bugs; hands-on experience with AI developer/agent tooling (e.g., Claude Code, GitHub Copilot or similar) and building agents that amplify inputs and orchestrate multi-step workflows (prompt engineering, tool integration).