Anne-bot

Scaling Expert Mentorship Through AI—
Without Replacing Human Judgment

A 2-week proof of concept exploring whether AI could reduce PRD review cycles by codifying director-level coaching patterns—and discovering that the bottleneck wasn't tooling, it was foundational alignment on what a strong problem statement actually is.

Overview

Anne-bot is a conversational AI prototype designed to scale director-level PRD coaching, helping Product Managers strengthen problem statements and arrive at reviews better prepared—without replacing human judgment, strategic oversight, or organizational context.

Instead of auto-generating answers, Anne-bot prompts PMs with Socratic questions that clarify assumptions, reveal gaps, and improve reasoning before formal review sessions—mirroring the coaching style used by Digital Cabin leadership.

The Stakes

PRD reviews were thoughtful but slow. Every document required director approval, creating a quality-vs-velocity tension: leadership wanted to move faster, but not at the expense of product rigor or strategic clarity.

The opportunity wasn't automation—it was scaling mentorship to prepare PMs before reviews, raising the floor so director time could focus on strategic alignment rather than foundational fixes.

QUICK FACTS

Role: Product Manager & AI Practitioner (Solo)

Duration: 2 weeks (Proof of Concept)

Validation: Tested with 6 PMs across 10+ PRDs

Platform: Internal Ford LLM + Gemini

Key Finding: Teaching > Automating for sustainable impact

Status: POC validated feasibility; surfaced need for foundational scaffolding before production investment

My Role

Product Manager & AI Practitioner | Solo Project
Led discovery, strategy, prototyping, evaluation, and iteration across a 2-week exploratory proof of concept.

What I Did

Discovery & Strategy

Analyzed transcripts, meeting notes, Slack discussions, and PRDs to extract Anne's coaching patterns.
Conducted 6 PM interviews and observed live review sessions.
Framed opportunity as scaling mentorship (not automating reviews) and defined riskiest assumptions to test.

AI Prototyping & Evaluation

Built iterative GPT prototypes mirroring Anne's reasoning, tone, and Socratic questioning style.
Tested against real PRDs, measuring usefulness, clarity, tone, and trust.
Developed evaluation methodology comparing AI feedback quality to human coaching.

Strategic Pivot

Identified structural bottleneck: PRD quality issues stemmed from weak problem statement skills, not review velocity.
Recommended investing in teaching problem statement framework before scaling AI—proving when not to build is as strategic as shipping.

Why This Project Mattered to Me

I wanted to explore whether AI could scale expert thinking without replacing human judgment. This was my chance to practice applied prompt engineering grounded in real organizational dynamics, develop rigorous evaluation methodology, and learn when foundational work (teaching frameworks) creates more value than technological solutions.

The Challenge

PRD reviews were slow and centralized. Every document required director-level approval before teams could move forward, creating downstream delivery delays. Anne's reviews were thoughtful, Socratic, and rigorous—but because she led all PRD review sessions, PMs often waited days for feedback, even if they just needed a quick gut-check.

Leadership recognized the slowdown—but didn't want to compromise quality, standards, or product thinking.

The Tension

Business Need: Move faster without sacrificing quality
PM Need: Get feedback without long wait times
Director Need: Maintain rigor and strategic alignment

Traditional solutions had clear limitations:

Hire more directors → Expensive, slow to scale, dilutes expertise
Lower review standards → Unacceptable quality risk
Skip reviews → Defeats the purpose of having them

The Opportunity

What if AI could codify Anne's coaching patterns to prepare PMs before formal reviews—raising the floor so director time focused on strategic alignment rather than foundational fixes?

Product Hypothesis

Riskiest Assumption

AI can play a productive role in the PRD review process without replacing human judgment, nuance, or organizational context.

If True, Then:

PMs could access early coaching before formal reviews
At least one review cycle could be eliminated
Directors could spend less time on foundational fixes and more time on strategic alignment
Feedback quality would remain high—or improve
PM confidence and preparedness would increase

This hypothesis shaped how Anne-bot was designed, evaluated, and iterated—and ultimately, what it revealed about the real bottleneck.

Discovery

To understand why reviews were slow, I used mixed methods:

RESEARCH METHODS

Observation: Attended live PRD review sessions
6 PM Interviews: Surfaced perceptions, frustration points, expectations around reviews
Transcript Analysis: Captured Anne's real language, questioning techniques, reasoning patterns
Artifact Comparison: Reviewed PRDs before + after meetings to identify what changed
Prototype Feedback: PM evaluation of Anne-bot responses for relevance and usefulness

Mixed methods revealed both behavioral patterns and structural bottlenecks—essential for designing the right intervention.

What I Found

A clear pattern emerged across all data sources:

PRDs weren't slow because PMs were unprepared—they were slow because alignment on the problem statement took multiple rounds.

Key Patterns:

Most review time centered on clarifying the true user problem
PMs and leadership defined "problem statement" differently
Once the problem section was strong, the rest of the PRD moved quickly
Common issues: jumping to solutions, describing features instead of user needs, conflating business goals with user problems

The Core Insight

Everything downstream—scope, metrics, prioritization—depends on a strong problem statement.

If the problem foundation wasn't clear, reviewing anything else was performative.

Prototyping Strategy

I used Anne-bot itself as a research tool—building to learn, not to ship.

Iterative Learning Framework

Each iteration tested a different hypothesis:

Iteration 1: Can AI replicate coaching voice and logic?
Iteration 2: Can human-in-the-loop reduce irrelevant feedback?
Iteration 3: Should AI focus on entire PRD or specific sections?

Testing drove continuous refinement based on PM feedback.

Building the Solution

Iteration 1 — Capturing the Coaching Voice

Goal: Replicate Anne's tone, logic, and conversational structure

Approach:

Mapped Anne's Socratic questioning patterns from transcripts
Prototype reviewed entire PRDs at once
Focused on rhythm, clarity, and thoughtful guidance

Result:

✅ Tone alignment worked—PMs recognized Anne's voice

❌ Context gaps surfaced immediately: "Anne would never say this—she already knows this."

Learning: Accurate tone isn't enough if feedback feels uninformed.

Iteration 2 — Human-in-the-Loop Clarification

Problem: Some feedback felt irrelevant because the AI lacked organizational context PMs assumed Anne would have.

Solution: Redesigned interaction flow to make context gaps explicit:

Review one PRD section at a time
Pause to confirm relevance and ask clarifying questions
PMs add missing context dynamically
Feedback updates based on new information

Result:

✅ PMs felt heard and understood

✅ Feedback quality improved dramatically

✅ Shifted prototype from evaluator → collaborator

Learning: Transparency about limitations builds trust more than false confidence.

Iteration 3 — Prioritizing the Problem Statement

Observation: Testing made one truth unavoidable:

If the problem statement was unclear, everything else stalled.

Decision: Anne-bot began focusing primarily on diagnosing and coaching the problem section before addressing the rest of the document.

Approach:

Start with problem statement review
Use Socratic questions to strengthen reasoning
Only move to other sections once problem foundation is solid

Result:

✅ Reviews became more focused and productive

✅ PMs described it as "thinking partner, not grader"

✅ Turned feedback into structured thinking practice

Learning: The highest-leverage intervention point is the foundation, not the finish line.

Strategic Design Principles

Three validated principles guided the prototype design:

1. Conversational Guidance Over Prescription

Socratic questioning strengthened PM reasoning more than directive edits.

Example:

❌ "Your problem statement is too vague."

✅ "Who specifically experiences this problem? What happens when they encounter it? What evidence do we have that this matters to them?"

2. Human-in-the-Loop Transparency

Admitting uncertainty ("I may be missing context—can you clarify?") built trust and improved relevance.

Why It Worked:

Acknowledged AI limitations openly
Gave PMs agency to correct misunderstandings
Prevented irrelevant feedback from undermining credibility

3. Flexible Interaction Model

PMs needed different modes depending on whether they were exploring, refining, or validating—not a one-size-fits-all review.

Result: Anne-bot became a thinking partner, not a reviewer.

The Critical Insight

Problem Definition Drives Everything

Review cycles weren't slow because of workflow inefficiency—they were slow because PMs and leadership weren't aligned on what a problem statement is.

If the problem foundation wasn't strong, reviewing anything else was performative.

Product Implications

This insight reframed the entire opportunity:

Wrong Solution: Build AI to review PRDs faster

Right Solution: Teach what a strong problem statement is, then use AI to practice applying that framework

Before productionizing Anne-bot, the org needed to:

Define and document problem statement criteria (What makes a problem statement strong?)
Create teaching resources (templates, examples, guidelines)
Build shared understanding across PMs and leadership

Core Principle: AI cannot compensate for foundational ambiguity—it can only scale what's already clear.

Recommendation: Invest in scaffolding before scaling automation.

Key Outcomes

Prototype Validation

Tested with: 6 PMs across 10+ PRDs

Results:

Designed to eliminate at least one review cycle through better preparation
PMs arrived better prepared and more confident at formal reviews
Shifted review discussions toward strategic alignment rather than foundational fixes
Revealed problem-statement clarity as the root bottleneck—not workflow efficiency
Demonstrated that AI can scale mentorship, not just efficiency

The prototype didn't replace review sessions—it raised the floor, ensuring time together was more strategic, focused, and high-value.

If I Were to Productionize This

Phased Approach

Phase 1: Foundation (Month 1-2)

Formalize problem statement framework with leadership
Create reference templates and examples
Pilot teaching workshops with 2-3 teams
Document shared criteria for "strong problem statement"

Phase 2: Pilot (Month 3-4)

Deploy Anne-bot to pilot teams who've completed training
Measure: review cycle reduction, PM confidence, feedback quality
Iterate based on usage patterns and PM feedback
Refine prompts based on real-world performance

Phase 3: Scale (Month 5-6)

Roll out org-wide once scaffolding is proven
Build feedback loop: Anne-bot surfaces common gaps → informs training updates
Establish continuous improvement cycle

Success Metrics

Primary Metrics:

Reduce review cycles from 3 → 1-2 on average
Maintain or improve feedback quality (measured via PM satisfaction surveys)

Secondary Metrics:

Increase PM confidence entering reviews (pre/post surveys)
Reduce director time spent on foundational fixes (time tracking)
Improve problem statement quality (evaluated against documented criteria)

Long-term Goal:

Shift director time toward strategic alignment, not foundational coaching.

Why This Phased Approach

This ensures foundation before scale—avoiding the trap of automating ambiguity and instead codifying clarity that AI can amplify.

Key Learnings

1. Problem Definition Is the Leverage Point

Alignment upstream prevents chaos downstream. The highest-value intervention isn't speeding up reviews—it's strengthening the foundation those reviews depend on.

2. Teaching Scales Better Than Automating

The best AI products cultivate capability rather than dependency. Sustainable impact comes from strengthening thinking, not delivering answers.

3. Transparency Builds Trust

Human-in-the-loop acknowledgment of uncertainty—not false certainty—increased adoption and credibility. Admitting "I may be missing context" worked better than pretending to know everything.

4. Evaluation Is as Critical as Prompting

Testing output quality systematically is essential for scaling AI experiences. Without rigorous evaluation, you risk scaling noise instead of value.

5. Strategic Restraint Matters

Sometimes the right product decision is to not build—or to build foundations first. Knowing when to pause and strengthen the system is as important as knowing when to ship.

What This Project Taught Me About AI Product Work

1. Start with the System, Not the Tool

AI interventions fail if the underlying process has structural ambiguity. Before building AI to scale something, ensure there's clarity worth scaling.

2. Context Matters More Than Capability

AI performance in isolation ≠ AI usefulness in real workflows. The best prompts fail if they don't account for organizational context, user mental models, and existing processes.

3. Human-in-the-Loop Design Enables Trust

Giving users agency to correct AI misunderstandings prevents frustration and builds confidence in the system. Transparency > perfection.

4. Focus on Leverage Points, Not Surface Problems

The real bottleneck often isn't what users complain about. Deep discovery reveals where intervention creates the most value.

5. Know When Not to Build

The strategic move isn't always feature expansion—sometimes it's tightening foundations. Anne-bot's greatest value was revealing what needed to happen before scaling AI.

Final Reflection

Project Status

Proof-of-concept validated feasibility and surfaced the need for strong problem-statement scaffolding before production investment.

The Bigger Picture

Anne-bot reinforced a belief that now shapes my approach to AI product development:

The most valuable AI products don't deliver answers—they strengthen thinking.

The goal isn't to replace expert judgment with AI—it's to scale the conditions under which people develop better judgment themselves.

This project taught me that the best AI products are often the ones that help users become better versions of themselves, not the ones that do the work for them.

What I'd Do Differently

If I were starting this project again with the insights I have now:

1. Start with Foundational Alignment First

Before building the prototype, I'd facilitate a working session with leadership and PMs to:

Document what "strong problem statement" means
Create shared examples of good vs. weak problem statements
Align on evaluation criteria

Why: This would have revealed the root bottleneck faster and clarified whether AI was even the right intervention.

2. Build Decision Gates Into Testing

Define clear thresholds upfront:

What results would make me double down?
What findings would make me pivot?
What evidence would suggest stopping entirely?

Why: Explicit decision criteria prevent attachment to solutions and enable faster, more objective pivots.

3. Test with More Diverse PRD Types

The prototype focused on one PRD format. Testing across different project types (0→1 products, feature enhancements, technical improvements) would reveal whether coaching patterns generalize or need customization.

4. Measure Baseline Review Quality

Track problem statement quality before Anne-bot to quantify improvement afterward. Without baseline data, "better" is subjective.

5. Involve Directors in Prototype Testing Earlier

Getting Anne's feedback on prototype responses would have:

Validated coaching pattern accuracy faster
Surfaced organizational context gaps sooner
Built stakeholder buy-in earlier in the process

Why: The people whose expertise you're codifying should validate the codification.

Anne-bot

Scaling Expert Mentorship Through AI—Without Replacing Human Judgment

Overview

QUICK FACTS

My Role

Product Manager & AI Practitioner | Solo ProjectLed discovery, strategy, prototyping, evaluation, and iteration across a 2-week exploratory proof of concept.

What I Did

Why This Project Mattered to Me

The Challenge

The Tension

The Opportunity

Product Hypothesis

Riskiest Assumption

If True, Then:

Discovery

What I Found

The Core Insight

Prototyping Strategy

Iterative Learning Framework

Building the Solution

Iteration 1 — Capturing the Coaching Voice

Iteration 2 — Human-in-the-Loop Clarification

Iteration 3 — Prioritizing the Problem Statement

Strategic Design Principles

1. Conversational Guidance Over Prescription

2. Human-in-the-Loop Transparency

3. Flexible Interaction Model

The Critical Insight

Problem Definition Drives Everything

Product Implications

Key Outcomes

Prototype Validation

If I Were to Productionize This

Phased Approach

Success Metrics

Why This Phased Approach

Key Learnings

1. Problem Definition Is the Leverage Point

2. Teaching Scales Better Than Automating

3. Transparency Builds Trust

4. Evaluation Is as Critical as Prompting

5. Strategic Restraint Matters

What This Project Taught Me About AI Product Work

1. Start with the System, Not the Tool

2. Context Matters More Than Capability

3. Human-in-the-Loop Design Enables Trust

4. Focus on Leverage Points, Not Surface Problems

5. Know When Not to Build

Final Reflection

Project Status

The Bigger Picture

What I'd Do Differently

1. Start with Foundational Alignment First

2. Build Decision Gates Into Testing

3. Test with More Diverse PRD Types

4. Measure Baseline Review Quality

5. Involve Directors in Prototype Testing Earlier

StoryJam

Capture Caddy

Scaling Expert Mentorship Through AI—
Without Replacing Human Judgment

Product Manager & AI Practitioner | Solo Project
Led discovery, strategy, prototyping, evaluation, and iteration across a 2-week exploratory proof of concept.