DescriptionAmazon’s Rufus AI team is building the future of conversational shopping. Rufus helps hundreds of millions of customers find and discover products through natural language, and behind every response is an automated quality measurement system powered by LLM-as-a-Judge (LLMAJ) technology. We are seeking a Sr. Product Manager-Tech to own the quality governance, global scaling, and operational excellence of this judge portfolio.You will work alongside Language Engineers who build and tune judges, Product Managers who define quality criteria and evaluation standards, Data Scientists who operate evaluation pipelines, and Engineering teams who build the infrastructure that runs evaluations. This is a high-autonomy role: you own your domain end-to-end and are expected to drive decisions, not just track workstreams.This role sits at the intersection of AI evaluation, product management, and applied tooling. You will own the governance framework for a portfolio of dozens of LLM judges that power critical evaluation metrics used for release decisions, competitive benchmarking, and leadership reporting. You will drive the localization of judges from en-US to 5+ international marketplaces, facilitate model evaluation and debugging workflows, and build purpose-built tools and agents to automate governance operations at scale.Key job responsibilities Own the LLMAJ governance framework: judge registry, versioning standards, quality validation gates, deprecation policies, and agreement rate monitoring across the full judge portfolio Own the international LLMAJ expansion: drive judge localization from en-US to global marketplaces, identify coverage gaps, define remediation plans, and validate judge quality per locale Facilitate model evaluation and debugging: work with Language Engineers and Scientists to trace response quality issues, inspect production logs, and root-cause judge disagreements or quality regressions Build purpose-built tools and agents: code automation using internal agent frameworks to streamline governance workflows, judge monitoring, data extraction, and reporting Define and own partner-facing quality metrics powered by LLMAJ, including defect rates, agreement rates, and evaluation dimension reporting across partner teams Drive human-in-the-loop validation workflows, coordinating between evaluation platforms and annotation teams to maintain judge calibration Drive discipline on evaluation requests by enforcing data-driven problem statements, clear scoping, and definition of done before work begins Write business requirements documents, contribute to leadership updates, and represent LLMAJ governance in cross-functional forumsA day in the lifeYou start the morning checking agreement rate dashboards for drift across international locales and triaging alerts. A new prompt release is shipping, so you pull evaluation results, spot two judges regressing in the Japanese marketplace, and open a debugging session with a Language Engineer to trace the root cause. After lunch, you present international judge coverage in a cross-functional review. In the afternoon, you ship an update to a governance agent you built that auto-generates weekly judge health reports. You close the day pushing back on an under-scoped evaluation request.About The TeamWe are the team responsible for measuring whether Amazon’s AI shopping assistant is actually good. We build LLM judges, define quality standards, and run evaluations that directly inform what ships to hundreds of millions of customers. Our team includes Language Engineers, Data Scientists, and Product Managers who work closely with Science, Engineering, and Product teams across the organization. We move fast, care deeply about measurement rigor, and believe that if you cannot measure quality automatically, you cannot improve it at scale.Basic Qualifications Bachelor’s degree Experience in technical product management, program management or engineering Experience owning/driving roadmap strategy and definition Experience with end to end product delivery Experience with feature delivery and tradeoffs of a product Experience contributing to engineering discussions around technology decisions and strategy related to a product Experience in representing and advocating for a variety of critical customers and stakeholders during executive-level prioritization and planningPreferred Qualifications Experience in using analytical tools, such as Tableau, Qlikview, QuickSight Experience in building and driving adoption of new toolsAmazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.Company – Amazon UK Services Ltd.Job ID: A10414420

Senior Product Manager – Tech, GenAI, Amazon Rufus Full Time NEW

Amazon

Job Overview

Log In

Sign Up

Senior Product Manager – Tech, GenAI, Amazon Rufus Full Time NEW

Amazon

Apply For This Job

Related Jobs

Complaints & Resolution Officer Full Time

Automation Engineer (Electrical Bias) Full Time

Product Manager – Network Full Time

RS Sheetmetal Tech II (FAR) Full Time

Pilates Instructors, Bishopsgate Full Time

Ethics & Compliance Investigator (FTC) – Financial & Regulatory Full Time

Job Overview

Apply For This Job