Back to Projects

Different Lens

Production

Your voice tells truths your words won't.

Voice emotion analysis for self-reflection. Speak into the app and get back what your voice is actually saying — Whisper transcription, dual emotion models, Claude interpretation, and a memory system that tracks your emotional patterns across weeks and months.

Screenshot coming soon

The Problem

Text journaling captures what you chooseto say. Voice captures what you can't hide — the tremor in your words when you talk about your kid, the flatness when you say you're “fine.” Traditional self-help apps ignore the richest signal available: how you sound.

And even the ones that listen forget everything between sessions. No pattern recognition, no growth tracking, no memory. Just isolated snapshots that never connect into a story.

The Solution

Voice Emotion Pipeline

Record a voice journal entry. Whisper transcribes, SpeechBrain extracts categorical emotions (anger, sadness, happiness, neutral), Audeering measures dimensional affect (arousal, valence, dominance). All from 1.5 seconds of audio.

Claude Interpretation Layer

Raw emotion scores are meaningless alone. Claude synthesizes transcript + emotion data + user history into genuine insight — not generic advice, but pattern recognition specific to your story.

Dual Emotion Models

SpeechBrain wav2vec2-IEMOCAP for categorical emotions (routes to therapeutic Islands). Audeering wav2vec2-large-robust for dimensional analysis (arousal/valence/dominance profiles over time).

Memory-First Architecture

6-layer hybrid memory persists across sessions. Your growth story builds over time. The AI remembers your patterns, your breakthroughs, your setbacks — so every conversation deepens, never resets.

Tech Stack

Next.js(Framework)
TypeScript(Language)
Whisper(Transcription)
SpeechBrain(Emotion (Categorical))
Audeering(Emotion (Dimensional))
Claude API(Interpretation)
Supabase(Backend)
PostgreSQL(Database)
HuggingFace(Models)
Vercel(Deployment)

Key Decisions

Voice-First, Not Text-First

Speaking is harder to filter than typing. The voice pipeline extracts emotion from audio beforethe transcript is even processed — catching signals the user might consciously suppress in text.

Two Models, Two Use Cases

SpeechBrain (Apache 2.0) handles categorical routing — which therapeutic “Island” a user needs. Audeering (CC BY-NC-SA) handles dimensional profiling for deeper analysis. Same audio, two complementary lenses.

AI Boundaries by Design

Full AI at intake (emotion detection, pattern extraction). Restricted AI during therapeutic content delivery (pre-written, clinically grounded). Rule-based adaptations, not hallucinated advice.