Capture Caddy
When AI Meets the Rhythm of Human Creativity
A 2-week exploratory proof of concept testing whether voice-guided AI coaching could help part-time photographers rebuild posing confidence between shoots—and discovering that preparation, not performance, is where AI creates value.
Overview
Capture Caddy is a Custom GPT prototype designed to help part-time photographers regain posing confidence after weeks or months between shoots. When that much time passes, muscle memory fades—and searching Pinterest boards or reference notes mid-session disrupts presence, connection, and momentum.
The prototype explored whether tailored posing ideas, composition guidance, and camera settings—delivered conversationally and hands-free—could support confidence without interrupting the creative rhythm of a shoot.
The Stakes
This was my first applied prompt-engineering project using Custom GPTs. The goal wasn't to ship a product—it was to understand how AI fits into creative work, when humans are cognitively available to learn, and where technical solutions collide with emotional and social dynamics.
The project answered a critical product question: When does AI enhance creative work vs. interrupt it?
QUICK FACTS
Role: Product Manager & AI Practitioner (Solo)
Duration: 2 weeks (Exploratory POC)
Platform: OpenAI Custom GPT
Validation Method: Real-world shoot testing (desk simulation, live outdoor session, pre-shoot rehearsal)
Key Finding: Timing is a product constraint, not a technical one
Pivot: Real-time coaching → Pre-shoot preparation tool
Status: Proof of concept validated core insight about AI timing in creative workflows
My Role
Product Manager & AI Practitioner | Solo Project
Designed, built, tested, and iterated on a 2-week exploratory proof of concept.
What I Did
Strategy & Prototyping
Defined riskiest assumption: voice-guided AI can fit into live creative sessions without disrupting flow. Designed Custom GPT behavior for posing guidance, settings recommendations, and tone calibration—prioritizing trust and reducing cognitive overwhelm over technical novelty.
Real-World Testing & Pivot
Conducted live shoot trials in authentic creative context. Observed latency-driven rapport breaks and attention switching costs when subjects needed to focus on AI instead of photographer. Reframed product from live coaching to pre-shoot confidence builder (pose packs, reminders, setup walkthroughs) based on testing evidence.
Human-Centered Design
Prioritized hands-free constraints, subject comfort, and psychological safety. Discovered that preparation support amplifies creative flow while live intervention interrupts it—proving where AI adds value vs. creates friction.
Why This Project Mattered to Me
I wanted to test whether AI could fit into live creative workflows or if timing and context fundamentally shape usefulness. This was my chance to practice prompt engineering in real human-centered scenarios, develop evaluation methodology for AI interruption vs. amplification, and learn when preparation beats real-time assistance—even when real-time seems more innovative.
The Problem
This wasn't an abstract design challenge—it was something I experienced firsthand as a part-time photographer.
The Confidence Gap
As a part-time photographer, weeks or months can pass between shoots, and each new session feels like rebuilding muscle memory from scratch.
That gap sets off a predictable chain reaction:
Long time between shoots → Loss of posing fluency
Loss of fluency → Hesitation and awkward pauses during session
Hesitation → Subjects feel unsure or disconnected
Disconnection → Lower-quality photos and fewer usable images
It wasn't a lack of technical skill—it was timing, confidence, and momentum.
Why Existing Solutions Weren't Enough
I already kept posing boards, Pinterest screenshots, and reference notes—but none of them worked in the moment.
Stopping to reference them broke:
Eye contact with subjects
Emotional presence and connection
Natural rhythm between photographer and subject
Portrait photography depends on that connection. Breaking it to look at references undermines the very thing that makes portraits work.
What I Actually Needed
Not just more posing ideas—but a way to quickly get back up to speed, stay confident, and stay connected throughout the session.
Product Hypothesis
Riskiest Assumption
Voice-guided, hands-free posing support can increase photographer confidence during live sessions without disrupting presence or connection.
If True, Then:
It would reduce the likelihood of my mind going blank mid-shoot (coaching fallback available)
I could stay fully present and emotionally connected with subjects (hands-free = no visual distraction)
Sessions would feel smoother, with fewer pauses or momentum breaks
Subjects would remain relaxed, engaged, and comfortable
Final photos would include more diverse poses and authentic expressions
I'd have more strong images to deliver—giving clients better options
Ramp-up time between sporadic shoots would shrink, easing pre-shoot anxiety
This confidence → connection → better photos hypothesis became the foundation for building Capture Caddy.
Product Strategy: Build to Learn
Because this was my first time building a GPT assistant, I chose to build and test directly rather than validate through user research upfront.
Why This Approach
First time building with Custom GPTs → Hands-on learning was fastest path to understanding
Single user (me) → Could test immediately in authentic context without recruitment overhead
Exploratory goal → Iteration speed mattered more than validation rigor
Domain expertise → 10 years photography experience meant I could evaluate quality without external validation
Tradeoff: Risk of solving my problem vs. a widespread one—but speed-to-insight was priority for a learning project.
How I Tested
Rather than simulated scenarios, I evaluated through real-world use:
Test 1: Desk Simulation
Confirmed pose quality, clarity, and voice responsiveness
Validated technical viability in controlled environment
Test 2: Live Outdoor Shoot
Observed latency impact on rapport and connection
Measured attention switching costs in real creative context
Identified where AI interrupted vs. supported flow
Test 3: Pre-Shoot Rehearsal
Tested whether practice (not performance) was the right intervention point
Evaluated shared rhythm with subject in low-stakes environment
What I Evaluated
Rather than metrics, I focused on experiential quality:
Did it increase posing confidence? (Could I direct more fluidly?)
Did it preserve presence and connection? (Did subjects stay engaged?)
Did it support creative flow—or interrupt it? (Was momentum maintained?)
These lived tests surfaced the insight that ultimately reshaped the product direction.
Initial Design
Capture Caddy was intentionally simple—a Custom GPT designed to offer posing support without pulling attention away from the subject.
Core Features
1. Personalized Pose Suggestions
The assistant generates poses tailored to real people, environments, lighting, and mood—not generic templates.
Each pose includes:
Visual reference or detailed description
Step-by-step positioning guidance
Expression and interaction prompts
Suggested camera settings (lens, aperture, shutter speed)
Design Philosophy: Context-aware suggestions that feel relevant, not algorithmic.
2. Reverse-Engineer Mode
Upload a reference image and Capture Caddy will:
Break down composition, framing, and body angles
Recommend lens and exposure choices
Guide toward recreating the feeling of the image with your own subjects
Use Case: When you see a pose you love but don't know how to direct it.
3. Voice-Guided Interaction
Using ChatGPT's voice mode, photographers can request ideas hands-free:
Commands:
"Give me three poses."
"Next."
"Repeat."
"Slow down."
Intent: Keep hands on the camera and eyes on the subject—not on a phone screen.
4. Prompt Architecture
Built through iterative prompt design—no fine-tuning or external data.
Structured for:
Clarity (simple, actionable direction)
Emotional sensitivity (reading subject comfort)
Low cognitive load (minimal decision-making required)
This was a UX-driven exploration, not a technical showcase.
Testing in the Real World
Test 1 — The Desk "Vibe Check"
Before heading into the field, I tested the prototype indoors at my desk.
Results:
✅ Pose suggestions were accurate and creative
✅ Lens recommendations made sense for described scenarios
✅ Voice commands responded smoothly
On paper, everything worked.
But confidence in a controlled environment ≠ connection in a real one.
I needed to test where it actually mattered: during a live shoot.
Test 2 — Live Outdoor Shoot (The Failure Point)
I tested Capture Caddy during a portrait session with my daughter.
What Happened:
Latency that felt acceptable indoors became painfully slow outdoors.
Me: "Hi Capture Caddy."
[pause]
[nothing happens]
[I wait, focused on the AI]
Meanwhile: My daughter waits—unsure, disengaged, no longer in the moment.
The Moment of Truth:
While I focused on the AI, she waited. I had broken the unspoken rule of portrait photography:
Never disconnect from your subject.
The tool meant to support confidence was now creating:
Hesitation (technical failure undermining trust)
Awkwardness (unclear why we were pausing)
Emotional distance (broken connection with subject)
The Critical Observation
I wasn't solving the wrong problem—I was solving the right problem at the wrong time.
This insight reshaped the entire product direction.
Understanding the Real Constraint
The problem wasn't technical—it was cognitive and social.
During a portrait session, attention is fully occupied:
Reading body language and emotional state
Building and maintaining trust
Adjusting composition and light in real-time
Keeping subjects comfortable and confident
Trying to learn new poses while managing all that is cognitively unrealistic—and emotionally costly.
Real-time AI competes with presence. Learning can't happen during performance.
Test 3 — Practice, Not Performance (The Breakthrough)
To test the reframed idea, I treated Capture Caddy as a pre-shoot rehearsal tool instead of an on-set coach.
What Changed:
Set the assistant to speaker mode (no earbuds, no isolation)
Practiced poses with my daughter in our living room
Let her hear the prompts too—direction felt shared, not mediated
Moved slowly, without time pressure or performance stakes
Treated it as rehearsal, so perfection didn't matter
The shift was immediate.
What This Revealed
What mattered most wasn't just the shared rhythm—it was the chance to practice in a low-stress environment.
When I entered the real shoot after practicing:
I already knew the pose flow
Confidence was rebuilt before performance pressure
My daughter knew what to expect
We were already connected
Core Insight: AI isn't most valuable during creative performance—it's most valuable before it.
Key Insights
This project wasn't about choosing between creative support and human connection—it was about understanding when learning can actually happen.
1. Latency Exposed the Social Fragility
Even brief delays disrupted flow and rapport. Each pause created uncertainty—for me and for my subject. What looked like a tech limitation revealed a human one: Creative work depends on unbroken connection, and even small interruptions compound into emotional distance.
2. Learning Can’t Happen During Shoots
During a portrait session, cognitive capacity is maxed out managing:
- Technical execution (camera settings, composition, light)
- Emotional presence (reading subjects, building trust)
- Creative direction (posing, expression, interaction)
There's no bandwidth left for learning new information.
Trying to absorb new poses during a shoot is like trying to learn piano chords during a concert performance.
3. Timing Is a Product Constraint, Not a Technical One
The constraint wasn't AI capability—it was when humans can actually absorb information.
Real-time AI isn't wrong—it's just the wrong moment.
The right moment is preparation, not performance.
There's no bandwidth left for learning new information.
Trying to absorb new poses during a shoot is like trying to learn piano chords during a concert performance.
4. Core Product Insight
AI should prepare people for flow, not interrupt it.
Confidence begins before the shoot:
Skill refresh → Reduced anxiety → Smoother direction → Better images
Once Capture Caddy shifted into a pre-shoot practice tool, the experience aligned with how photographers—and humans—actually learn.
There's no bandwidth left for learning new information.
Trying to absorb new poses during a shoot is like trying to learn piano chords during a concert performance.
The Product Decision
The early prototype proved something unexpected: real-time AI wasn't the wrong solution—it was being used at the wrong moment.
Evidence from Live Testing
✅ Latency broke rapport even when prompts were relevant
✅ Divided attention created uncertainty for both photographer and subject
✅ Learning couldn't happen during performance—cognitive load was already maxed
Decision: Pivot Product Direction
From: Real-time on-set coaching
To: Pre-shoot preparation tool
Rationale: Put AI where learning can actually be absorbed, not where it competes with presence.
New Value Proposition
Build confidence before the shoot, not during it.
Redesigned Product Vision
Before the Shoot (New Intended Use)
Explore posing ideas without time pressure
Practice directing transitions and expressions
Build shared rhythm with subjects ahead of time
Reduce ramp-up anxiety after long breaks between sessions
Create mental muscle memory that activates during the real shoot
During the Shoot
No earbuds
No voice commands
No divided attention
Full presence with the subject
After the Shoot
Reflect on what worked
Refine approach for next session
Continue building confidence through iteration
Instead of competing with the moment, Capture Caddy now prepares photographers to enter it confidently.
Key Outcomes
Built and tested my first Custom GPT assistant end-to-end (hands-on AI prototyping experience)
Identified that timing, not functionality, was the core constraint (product insight applicable beyond photography)
Pivoted from live AI coaching → pre-shoot preparation tool (evidence-based product decision)
Developed repeatable approach for evaluating AI in human-centered workflows (methodology for future AI projects)
Strengthened posing fluency and reduced pre-shoot anxiety through rehearsal (validated that preparation approach works)
Clarified foundational principle: AI should prepare people for flow, not interrupt it (design philosophy for AI products)
What This Taught Me About Building AI Products
1. Context Matters More Than Capability
AI performance in isolation ≠ AI usefulness in real workflows.
Learning: The best AI doesn't just work technically—it fits where humans can actually use it. Evaluate in authentic contexts, not controlled demos.
Application: Test AI products in messy real-world conditions early. Desk tests lie.
2. Timing Is a Product Constraint
The right solution at the wrong moment is the wrong solution.
Learning: Understand the user's cognitive and emotional state at the intervention point. Are they receptive to learning, or fully occupied with performance?
Application: Map user workflows to identify when they're cognitively available vs. maxed out. Design AI interventions for available moments.
3. Build-to-Learn Accelerates Understanding
Prototyping revealed insights research alone wouldn't have surfaced.
Learning: For exploratory AI projects, building and testing in real contexts generates faster, deeper insights than hypothetical user interviews.
Application: When learning goals outweigh shipping goals, bias toward rapid prototyping with real-world testing over extensive upfront validation.
4. Honest Iteration Creates Better Products
The "failure" wasn't a dead end—it was a redirect toward the real solution.
Learning: Being willing to pivot based on evidence (rather than defending initial assumptions) unlocks breakthrough insights.
Application: Design decision gates into testing. Define what evidence would make you pivot vs. double down. Let data rewrite assumptions.
5. Measure Against Human Constraints, Not Technical Ones
Success isn't "does AI work?"—it's "does AI fit where humans can actually use it?"
Learning: Evaluate AI products against human cognitive capacity, emotional state, and social context—not just technical performance.
Application: Frame success metrics around human experience (confidence, flow, connection) not just AI performance (latency, accuracy).
What I'd Do Differently
If I were to build Capture Caddy as a real product:
1. Validate Problem Scope with 10-15 Photographers First
Why: I solved my problem, but is this widespread?
What I'd Ask:
How often do you shoot (frequency matters for confidence gaps)?
What happens when weeks/months pass between shoots?
How do you currently prep for shoots?
Where does confidence break down during sessions?
Impact: Understand if this is "Gloria's problem" or "part-time photographer's problem" before building.
2. Test Multiple Intervention Points Early
Why: I assumed live coaching was the right moment without testing alternatives.
What I'd Test:
Pre-shoot preparation (what I eventually discovered)
During shoot (what I initially built)
Post-shoot review (reflection and improvement)
How: Run parallel tests of all three timing approaches in week 1.
Impact: Discover optimal intervention point faster, avoid wasted development on wrong timing.
3. Build Decision Gates into Testing
Why: I iterated organically without clear pivot criteria.
What I'd Define Upfront:
What results would make me double down on live coaching?
What findings would trigger exploration of alternative timings?
What evidence would suggest stopping entirely?
Example Gates:
If latency < 2 seconds AND subjects stay engaged → keep live coaching
If subjects disengage during waits → test pre-shoot preparation
If neither timing works → stop and reconsider whether AI is right approach
Impact: Faster, more confident decisions with less attachment to initial assumptions.
4. Define Clearer Success Metrics Upfront
Why: I evaluated experientially, which was appropriate for exploration—but lacked quantifiable targets.
What I'd Measure:
Primary Metrics:
Photographer confidence (1-10 scale, before/after)
Posing fluency (# poses directed smoothly per 30 min)
Subject comfort (measured via post-shoot survey)
Secondary Metrics:
Ramp-up time (minutes until first strong shot)
Image quality (# strong images delivered per session)
Usable variety (# different pose types per session)
Impact: Clearer evidence for product decisions; easier to communicate value to stakeholders.
5. Consider Business Model Implications
Why: Built as learning project without considering monetization or distribution.
What I'd Explore:
Pricing Models:
Subscription ($10-20/month for part-time photographers)
Freemium (basic poses free, advanced features paid)
Integration (partner with photography education platforms)
Distribution:
Standalone GPT (current approach)
Mobile app (better for on-location use)
Integration with existing photography tools (Adobe, Lightroom)
Market Sizing:
How many part-time photographers exist?
What's their willingness to pay for confidence tools?
What alternatives do they currently use (and at what cost)?
Impact: Understand if there's viable business opportunity, or if this should remain a free community tool.
If I Were to Productionize This
Phase 1: Validate Problem Broadly (Month 1)
Goal: Confirm this problem exists beyond me
Activities:
Interview 15-20 part-time photographers
Survey broader photography communities (Reddit, forums)
Identify common confidence gaps and current solutions
Test willingness to pay
Success Criteria: 60%+ of part-time photographers report confidence gaps after breaks; no existing solution addresses it well
Phase 2: Build Pre-Shoot Preparation MVP (Month 2-3)
Goal: Validate that preparation approach solves the confidence problem
Features:
Pose library by scenario (family, couples, portraits, etc.)
Practice mode with voice guidance
Camera settings recommendations
Simple session prep checklist
What's NOT Included:
Live coaching (already know it doesn't work)
Advanced editing features
Social sharing
Complex customization
Success Metrics:
70%+ report increased confidence after using prep tool
50%+ use it before majority of shoots (stickiness)
4+ rating on usefulness (1-5 scale)
Phase 3: Iterate Based on Usage (Month 4-6)
Goal: Refine based on real usage patterns
Instrumentation:
Which pose categories used most?
How long do practice sessions last?
What features are ignored?
Where do users drop off?
Potential Additions (data-driven):
Pre-built pose sequences for specific scenarios
Integration with calendar (reminder to practice before scheduled shoot)
Subject-side experience (practice together before real shoot)
Success Metrics
Primary:
Photographer confidence increase (before/after scale)
Adoption rate among part-time photographers (% who use regularly)
Session quality improvement (more strong images delivered)
Secondary:
Time to first strong shot (ramp-up speed)
Pose variety per session (creative range)
Subject satisfaction (comfort and enjoyment)
Long-term:
Retention (do photographers keep using it?)
Word-of-mouth growth (do they recommend it?)
Business viability (if monetized, does revenue sustain development?)
Final Reflection
What This Project Taught Me
This POC wasn't about shipping a product—it was about understanding where AI belongs in creative work.
The insight that emerged—AI should prepare people for flow, not interrupt it—applies far beyond photography:
Coaching and mentorship: Prep materials before sessions, not during
Presentations: Practice support before stage, not real-time prompts
Creative work: Reference and inspiration before execution, not mid-flow
Complex tasks: Learning and skill-building separate from performance
The Bigger Principle
Technology shouldn't replace human connection—it should protect and prepare it.
The best AI products don't make experts unnecessary—they help people become more expert themselves.
What I Learned About
My Own Product Approach
This project reinforced beliefs that now guide how I build AI products:
1. Start with Real Problems, Not Technology
I had AI capability (Custom GPT) and looked for problems it could solve. Better: start with user problems and evaluate if AI is the right solution.
2. Test in Authentic Context Early
Desk tests validated nothing meaningful. Real-world testing in 20 minutes revealed what hours of simulation couldn't.
3. Be Willing to Kill Your Darlings
The voice-guided live coaching idea was clever—but wrong. Pivoting required letting go of what I thought was cool in favor of what actually worked.
4. Cognitive and Social Constraints Matter More Than Technical Ones
The limitation wasn't AI capability—it was human capacity. Great products respect human constraints, not just technical possibilities.
5. Learning Projects Have Different Success Criteria
This wasn't about user adoption or revenue—it was about understanding AI's role in creative workflows. That clarity is the foundation for better future product decisions.