Mezzo
Private, real-time volume awareness for shared workspaces
Designed and shipped a standalone web app that closes the awareness gap before coworkers have to intervene. Feedback escalates through temporal pressure — not alerts — keeping the signal non-judgmental and trust-safe.
-
Role: Product Designer — sole decision-maker across research, interaction design, and build. Scoped the problem, defined the intervention, and shipped a working MVP.
-
Team: Solo designer; 5 research participants (interviews + prototype feedback)
-
Context: Personal project; standalone web app for shared office environments
-
Duration: 6 days (research → synthesis → prototype → user feedback → mockups → iterate → QA → launch MVP)
-
Status: Working MVP launched — try it live. Core monitoring, feedback escalation, and PiP support included.
Headphones Remove Self-Awareness — With Nothing to Replace It
Return-to-office mandates have pushed people back into open floor plans, but there still aren't enough meeting rooms to support the number of video calls happening throughout the day. So people take meetings at their desks — usually wearing noise-canceling headphones.
Headphones solve one problem — but they quietly create another. They filter out ambient noise so you can focus on the meeting. But the same isolation that blocks incoming sound also removes the ambient feedback you normally use to regulate your speaking volume. With that feedback gone, you can't tell how loud you are anymore.
Some people try to compensate by wearing one ear on and one ear off. It's an unsatisfying workaround — environmental noise competes with meeting audio, and the split attention undermines the focus headphones were meant to provide.
For considerate people, this creates a constant, unresolved question: Am I too loud right now? Can people around me hear this conversation?
And headphones aren't the only trigger. When meeting rooms are full, people huddle at desks or talk in common areas. In the flow of conversation, especially when it becomes animated, they lose track of how far their voices carry.
By the time someone reacts, the disruption has already happened — a look, a comment, a pointed gesture — and the person speaking is the last to know.
Social Feedback Is the Only Fallback — and It Makes Things Worse
This isn't a problem of inconsiderate people. Most people who are too loud in shared spaces don't know it — and would adjust immediately if they did. The real problem is what happens once self-awareness disappears. The only feedback left is social: a look, a comment, someone pointedly putting on their own headphones.
Social feedback fails in predictable, compounding ways. It arrives late — after the disruption has already occurred. And it carries emotional weight — being told you're too loud feels like correction, not information.
Instead of helping people recalibrate, it causes withdrawal. People don't adjust slightly — they go quiet, disengage, or mentally check out for the rest of the meeting. Over time, they avoid the situation altogether — relocating to cars, skipping desk meetings, or disengaging when private space isn't available.
The more considerate someone is, the more costly social correction becomes — and the more likely they are to withdraw rather than adjust. The feedback channel itself produces the wrong behavior.
Replace Social Correction With a Private, Continuous Signal
- It arrives late → the signal must be continuous and real-time
- It carries emotional weight → the signal must be private and non-judgmental
- It causes withdrawal → the signal must feel ambient, not corrective
- Within range — calm and contained
- Outside range briefly — pressure begins to build
- Outside range, sustained — containment is lost
Escalation is temporal, not intensity-based. The feedback doesn't get louder or brighter — it loses containment over time, mirroring how sound spreads in a shared space. An intensity-based alert would reproduce the same punitive dynamic as social correction. Temporal escalation treats brief spikes as normal and responds only to sustained patterns — the same patterns that genuinely disrupt shared environments.
The system is equally defined by what it doesn't do. No recording or storage. No transcription. No alerts or sounds that break focus. No language that labels behavior as good or bad. Trust depends on users believing the system isn't watching them — just reflecting what's happening.
By replacing social correction with a private, continuous signal, Mezzo shifts volume management from a social problem to a personal one — giving people awareness early enough that coworkers never have to intervene.
Why This Mattered
Trust Was the Adoption Gate — Not Usefulness
Mezzo only works if people believe it isn't recording them. That belief isn't earned by a privacy policy or a tooltip — it's earned by what the system is structurally incapable of doing. No audio storage, no transcription, no data leaving the browser. If any of those constraints were relaxed, adoption collapses. Not gradually — immediately.
The same logic applies to feedback accuracy. A system that flags normal behavior — getting excited, laughing, leaning back — trains users to ignore it. Temporal escalation exists to protect repeated use: brief spikes are forgiven, only sustained patterns trigger a response. If the system cries wolf once, it loses the user permanently.
Screen sharing is the sharpest survivability risk. The moment feedback becomes visible to the entire meeting, the privacy promise breaks. The design doesn't fully solve this yet — but acknowledging it as a known constraint, not ignoring it, is what makes the rest of the system credible.
Mezzo doesn't need universal adoption to work. It serves people who want to adjust but lack feedback — a narrow audience, but the right one. Motivation is the one prerequisite the design can't manufacture, and trying to serve everyone would break the experience for those who actually care.
Key Strategic Decisions
Every decision was filtered through the same question: would this make someone feel nudged — or watched?
- Observed: Technical UI language ("audio captured," "level detected") triggered privacy concerns and surveillance associations in participant testing.
- Decision: Structurally ephemeral: no accounts, no saved data, no audio storage. Organic visuals (ink brush, hand-drawn forms) over technical precision.
- Tradeoff: No usage history, no personalization over time, no ability to show users their own improvement patterns.
- Trust Implication: Ephemeral architecture and organic aesthetics make surveillance associations structurally impossible — not just discouraged by messaging.
- Observed: Early thresholds calibrated for acoustic accuracy. Normal behavior — laughing, getting excited, leaning back — triggered warnings.
- Decision: Widened the acceptable range significantly. Required sustained loudness before escalation. Brief spikes never trigger feedback.
- Tradeoff: Some genuinely loud moments go unaddressed. The system deliberately under-reports to preserve trust.
- Adoption Implication: Flagging normal behavior trains users to ignore the system permanently. Threshold forgiveness protects the feedback channel for moments that actually matter.
- Observed: Audio bars served double duty — showing volume and implying system activity. In quiet environments, flat bars looked like a broken tool.
- Decision: Split into two signals: pulsing glow for liveness, audio bars for volume. Escalation responds to duration, not momentary spikes.
- Tradeoff: More visual elements to maintain. Brief loud moments receive no feedback, even when genuinely disruptive.
- System Reliability Impact: Users can verify system activity regardless of volume. Feedback maps to sustained disruption patterns, not momentary acoustic events.
- Observed: Browser extension required multi-day platform approval and produced inconsistent microphone permission flows across browsers.
- Decision: Rebuilt as a standalone web app with Picture-in-Picture capability for peripheral use during meetings.
- Tradeoff: Lost always-on background capability. Users must keep a tab open or launch PiP manually. No persistent taskbar presence.
- Usability Impact: Consistent mic permissions across browsers. Eliminated distribution friction that would have blocked iteration and testing within the 6-day timeline.
Impact At a Glance
Mezzo was validated through video prototype walkthrough with five participants and launched as a working MVP within six days. Validation focused on legibility, trust, and emotional tone — the adoption gates identified in the design.
Quantitative Impact
These metrics reflect behavioral adoption teams choosing to use StoryJam repeatedly, not just feature interaction or launch-week curiosity.
without explanation or onboarding
no participant felt corrected
Zoom, Teams, or taskbar presence
Metric Interpretation: These signals reflect comprehension and emotional safety, not usage or adoption — the tool has not been tested in sustained real-world use. What they confirm is that the core design bets (non-judgmental tone, legible feedback model, privacy-safe architecture) survived first contact with the target audience. A measurement framework targeting adoption, engagement, and self-reported usefulness is in place for post-launch validation.
- No participant raised privacy concerns about the tool itself — despite high sensitivity to being monitored at work
- No participant felt the feedback was punitive or judgmental
- No participant misunderstood what the tool was doing or how it worked
Qualitative Impact
"Might be nice to have a more obvious indicator of when you're over your intended volume for a sustained period of time."
This was the only substantive design critique — and it validated the temporal escalation model already built into the system rather than challenging it.
Deep Dive (optional): Evidence & Rigor
The sections below provide supporting evidence and rigor for readers who want to understand how key decisions were informed.
1. Discovery Patterns
Five semi-structured interviews with office workers across product, design, and engineering roles. All participants worked in open floor plans with return-to-office mandates. Interviews focused on meeting behavior, headphone use, volume awareness, and social dynamics around noise.
The room shortage is systemic, not incidental. One participant described submitting conference room shortage tickets daily for three weeks before giving up: "I think a lot of us have just gotten beaten down. We're like, okay, we'll just take meetings at our desks. It's fine." Desk meetings aren't a preference — they're the residue of infrastructure failure.
Social feedback operates on a gradient, not a binary. Participants described a range of signals: smiles from people who overhear and relate, unsolicited interjections into conversations, headphones going on silently, side-eye, audible sighs. Even the "positive" responses (smiling, commenting) signal that privacy has already been breached.
Volume awareness breaks down outside meetings too. Multiple participants described getting loud during informal desk conversations — not just headphone calls. One noted: "the more excited I get, the louder I get." The awareness gap extends to any animated exchange in shared space, confirming headphones as the most acute trigger, not the only one.
Content sensitivity compounds volume sensitivity. One participant described preferring to take a therapy call from her car rather than risk being overheard — a privacy concern layered on top of volume. The audience for this tool is already hyper-aware of being perceived. Design decisions that feel like surveillance would be disqualifying.
Self-selection confirmed from both sides. One participant said the tool sounded useful for others but not himself. Another asked: "I wonder if the biggest offenders would even be aware of their volume to try using this?" Both responses validate scoping to the considerate middle — people who want to adjust but lack feedback.
2. From Metering Loudness to Modeling Containment
The earliest prototype functioned like a live volume meter — visualizing loudness continuously relative to a normalized range. Testing with five participants proved the concept was legible but revealed three structural problems.
The concept was immediately understood. Participants consistently described the system accurately: "It's measuring how loud you are." "It's keeping track of whether you're within your target range." Mental model clarity was not the issue. How the feedback behaved over time was.
Liveness and loudness were conflated. The prototype used the same visual channel for two different signals: is the system listening, and is the volume outside range? When feedback was subtle, participants couldn't tell whether nothing was happening or the system wasn't active. One noted: "It might be good to have another volume bar just so you know it's picking stuff up." → Led to the decision to separate liveness from escalation. The final system maintains a persistent listening state through gentle pulse animation and variant cycling, and only escalates visually when sustained deviation occurs. Supports Strategic Decision: Separated System Liveness From Loudness Feedback.
"Target volume" penalized natural voice variation. One participant had a naturally booming voice. Framing volume as a "target" implied a correct level — making louder voices inherently wrong. The framing shifted from preferred/ideal volume to: the speaking level you consider appropriate for the people around you. Calibration became contextual and user-defined, not normalized. Supports Strategic Decision: Expanded the "You're Fine" Range.
ARTIFACT: Calibration screen (calibration-landing.png) — Shows the reframed calibration copy: "Talk for a few seconds so Mezzo can learn the speaking level you consider appropriate for the people around you." Evidence that the "target volume" finding changed the actual product language.
Threshold tuning required a fundamental approach change. The original thresholds used absolute values (0.0009 range) — too narrow for real human variation. The system shifted to percentage-based multipliers (1.5x, 2x, 3x baseline) with a noise gate to filter silence during calibration. Early calibration also averaged ambient silence into the baseline, making thresholds too low. Fixing this required capturing speaking volume specifically during setup. Supports Strategic Decision: Expanded the "You're Fine" Range.
The architecture shifted from amplitude to containment. The early prototype directly mapped microphone input to continuous UI rendering — every fluctuation visualized, no distinction between liveness and escalation. The final system introduced a time-based state machine: the audio engine emits a single logical state (in range, out of range short, out of range sustained), and the UI renders only that state. The visual layer cannot read raw audio data or alter logic.
- Logic/UI separation — the audio engine emits only state; the UI cannot alter monitoring behavior
- Temporal escalation — brief spikes are ignored; escalation based on uninterrupted duration, not peak amplitude
- Distinct liveness channel — listening state always visible; escalation layered on top
ARTIFACT (HIGH PRIORITY): v1 → v2 architecture diagram — Two columns. Left: "Continuous Meter" (Mic Input → RMS → Normalized Scale → Continuous UI). Right: "Temporal State Machine" (Mic Input → RMS → Threshold Evaluation → State Machine → Single Logical State → UI Renderer). Arrow between them: "Shift from amplitude to containment." This is the strongest systems thinking evidence in the case study.
The visual metaphor evolved from deformation to ring stacking. The earliest visual concept explored an ink deformation model — a perfect circle that loses containment as volume increases, eventually spilling across the surface. While conceptually strong, deformation was illegible at small PiP sizes and felt jittery with real-time audio reactivity. The pivot to ring stacking — a stable base circle that gains concentric rings (ring-0 through ring-3) — preserved the containment metaphor while remaining legible across all window sizes. Light-to-dark surface inversion during sustained alerts provides the final escalation signal.
VIDEO: Earlier low-fi prototype videos — Place here to show the v1 meter prototype and/or the ink deformation experiments. Before/after visual evidence of iteration. Even rough video is strong evidence because it shows real testing, not theoretical design.
Distribution constraint forced a platform pivot. The browser extension required multi-day platform approval and produced inconsistent microphone permission flows across browsers. Rebuilding as a standalone web app with PiP capability eliminated both problems and enabled faster iteration within the 6-day timeline. Supports Strategic Decision: Pivoted From Browser Extension to Web App.
ARTIFACT: Mic permission screen (mic-permissions-needed.png or mic-permissions-needed-desktop.png) — Shows the trust copy at the first permission gate: "Mezzo listens only to volume, never to what you say. Nothing is recorded or saved." Evidence of trust design at the most sensitive interaction point.
3. Interaction Architecture
PiP as container choice, not mode switch. The Document Picture-in-Picture API was chosen specifically because Mezzo needs to run during meetings without taking focus. Two PiP modes exist: compact (smallest possible window, circle fills the space, rings clip at edges) and expanded (mirrors the full card layout with heading, circle, button, microcopy). A resize observer detects which mode to render. The main tab shows a "Running in a mini window" screen with a status indicator while PiP is active. Monitoring continues uninterrupted regardless of container — PiP is a display decision, not a system state change.
ARTIFACT (HIGH PRIORITY): PiP states comparison — Three-up showing the same feedback state across containers: main tab (monitoring-within-range.png), compact PiP (pip-within-range.png), and the "Running in mini window" handoff state (picture-in-picture-running.png). Second row could show the outside-range states across PiP: pip-outside-range-2.png and pip-outside-range-3-sustained-blinking.png. Demonstrates that the same interaction model works at any size.
Logic–UI separation as an architectural constraint. The audio processing pipeline was treated as immutable throughout the build. The visualization component is a pure consumer: it accepts a volume level and a baseline, then handles ring selection, variant cycling, theme switching, and pulsation independently. This separation means the same component renders identically in the main tab, expanded PiP, and compact PiP. Changes to presentation never affect monitoring behavior.
Trust enforced through state management. Concrete implementation patterns that enact the ephemeral design posture:
- Monitoring runs in PiP or the main tab — never both simultaneously
- Closing PiP stops monitoring entirely; no hidden listening
- Calibration data is cleared on tab close; no persistent audio data
- Privacy microcopy appears on every screen: "Nothing is recorded or saved. Monitoring stops when you close this tab."
- The landing page is completely static — no microphone detection until the user explicitly opts in
Each pattern traces to the same principle: trust is earned by what the system is structurally incapable of doing, not by what it promises. Supports Strategic Decision: Ephemeral by Design.
4. Prototype Feedback
Five participants reviewed a video walkthrough of the final prototype. All five understood what the tool does from the walkthrough alone. Tone was read as neutral and supportive — no one felt judged.
Screen sharing breaks the privacy promise. Two participants independently flagged the same issue: during screen share, the volume indicator becomes visible to the entire meeting.
"I do wonder where I'd put this while I'm sharing my screen so that I can still see it but it doesn't distract everyone." — P05
This is the sharpest unresolved survivability risk. Design response: PiP provides partial mitigation; wearable haptics remain an open exploration for presenters.
Sustained loudness needs a stronger signal. One participant wanted clearer indication of extended loudness beyond moment-to-moment feedback.
"Might be nice to have a more obvious indicator of when you're over your intended volume for a sustained period of time." — P04
This validates the temporal escalation model. The ring stacking and surface inversion in the final build directly address this feedback.
The tool works for the willing, not the unaware. Two participants raised the same concern: the loudest people may never seek out a tool like this.
"I wonder if the biggest offenders would even be aware of their volume to try using this?" — P01
Acknowledged as a known constraint, not a design failure.
Additional signals: 4/5 requested native integration (Zoom, Teams, taskbar). Multiple participants suggested always-on background operation rather than deliberate launch — validating the PiP approach as a step in that direction.
5. Measurement Strategy
Effectiveness cannot be measured through loudness reduction. Volume depends on meeting type, topic intensity, social dynamics, and personality. Metrics like "average loudness reduction" or "time spent out of range" would conflate passion with failure and undermine the product's own framing. Mezzo isn't trying to make people quiet — it's trying to make them aware at the right moment.
- Adoption & return (behavioral signal) — Do people come back? Repeat sessions, PiP usage rate, monitoring duration per session.
- Engagement with feedback (interaction signal) — Do people notice and respond? Time between entering "outside range" and session end, PiP usage after sustained feedback.
- Self-reported usefulness (outcome signal) — A lightweight post-session survey: "Was Mezzo helpful in this conversation?" Triggered intermittently, not every session.
Anonymous tracking preserves the trust architecture. A random, anonymous ID stored locally in the browser. Not tied to identity, not shared across sites or devices. No audio, names, or IP-based identification collected. Privacy language is explicit and plain: "Mezzo uses anonymous usage data to improve the product. No audio or personal information is ever collected."
How This Sharpened My Judgment
These lessons generalized beyond Mezzo. They sharpened how I think about motivation, trust, and perceived intent when designing systems that interpret human behavior.
Lessons from Real-World Use
- Self-selection is a scope decision, not a limitation. Mezzo serves people who want to adjust but lack feedback. Trying to reach the unaware would require persuasion mechanics that break the experience for the willing. Narrowing the audience made the design more effective, not less.
- Trust is earned by structural incapability, not messaging. The hardest question wasn't "does detection work?" but "what would make someone believe this isn't recording them?" No amount of reassuring copy compensates for architecture that could surveil. Trust had to be designed into what the system cannot do — no accounts, no stored data, no audio leaving the browser.
- Perceived intent matters as much as accuracy. The early prototype was acoustically correct but emotionally wrong. Precision felt judgmental. Clarity felt clinical. A system that interprets human behavior is judged by how it feels to be interpreted — not whether the interpretation is technically right.
- Forgiveness in threshold design protects the feedback channel. Thresholds calibrated for acoustic accuracy flagged normal behavior — laughing, excitement, shifting posture. One false positive trains users to ignore the system permanently. Under-reporting is a better failure mode than over-reporting when adoption depends on repeated trust.
- Sequencing trust before features eliminated the most expensive risk first. Privacy microcopy, ephemeral architecture, and organic visuals were built before feedback accuracy was tuned. If trust fails, nothing else matters — optimizing feedback for a system nobody opens is wasted effort.
In systems that monitor, evaluate, or respond to human behavior, correctness is table stakes. Adoption depends on whether users believe the system understands its role — ambient awareness, not authority. When perceived intent feels punitive, users don't recalibrate; they leave. This applies to any system where the feedback channel must survive repeated voluntary use.
Future Directions
Individual Tools
- Wearables (Apple Watch, smart ring): haptic feedback with nothing visible on screen
- Phone companion: check your volume on a separate device during screen share
- Native meeting integration: Zoom/Teams/WebEx add-ons (feasibility worth researching)
- Open-area use: not just headphones; socializing, collaborating, anywhere you might not realize how far your voice carries
Workspace Infrastructure
- Ambient devices: Alexa/Siri-type devices that detect volume levels and give environmental feedback ("sound is carrying to the hallway") — not individual monitoring, but spatial awareness
- Room calibration: phone booths and meeting rooms tested for sound leakage; users get warnings when they're in a space that isn't as private as it looks
- Anonymous nudge system: a way to alert someone they're loud without face-to-face confrontation, reducing the social cost the research identified
- Aggregate awareness: office managers get a sense of overall noise patterns without tracking individuals