mikael-lovqvists-claude-agent/claude-voice-experiment

Files

mikael-lovqvists-claude-agent 20873be786 Add README, faster-whisper backend, and session fixes

- README explaining experimental/transparency purpose
- faster-whisper STT backend (fw-stt.mjs, faster-whisper-server.py, install-faster-whisper.sh)
- Bug fixes: Buffer alignment in on_audio, --debug-waveform URL parsing, silent fetch errors, instant dispatch timer leak
- Global uncaughtException/unhandledRejection handlers in query-demo.mjs
- Design docs: CHANGELOG, COMMAND-DISPATCH, INTERFACE-THEORY, VOICE-POLICY
- Systemd service unit templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-07 06:39:14 +00:00

4.0 KiB

Raw Blame History

Presenting the Voice Assistant

Notes on how to demonstrate this system effectively.

Core insight

The most compelling demo is an unscripted real working session — not a rehearsed script. The system's value is visible when it's actually being used to get things done. A scripted demo flattens the rough edges that make it feel real.

The query journal (when implemented) will make it possible to replay or reconstruct a real session for a recorded demo.

Most interesting things to show

Multi-character routing — the strongest demo angle

Address different named characters for different tasks: "Trance, take a note — calendar appointment Thursday at three" then "Archangel, add a new keyword to the vocabulary." The framing is immediately legible: different characters have different capabilities, different voices, different latency.

The latency difference tells the story on its own. Trance answers instantly from a local handler; Archangel pauses while Claude thinks. Audiences understand "local vs. remote model" poorly in the abstract, but they see and feel the pause — and it's meaningful rather than broken-feeling because it maps to character.

Some characters could be entirely local (notes, time, calendar lookups), some mostly Claude, some a mix. The architecture is visible in the demo without needing explanation.

Voice switching mid-session

Switch between characters and have each one say something in character. The contrast between voices makes the capability concrete. The multi-voice demo scripts in demos/ are a starting point but an improvised exchange lands better.

Doing actual work

A real task — adding something to a list, asking a factual question, modifying a file — demonstrates the pipeline end to end. The fact that it works while doing something else (cooking, tidying) is part of the story.

The classifier working (and not working)

Showing that short fragments don't trigger prematurely, and that send-words like "go" force dispatch, gives a sense of how the pipeline is designed. Showing a case where it misfires, and explaining why, is honest and interesting.

The "hands-free while doing something else" workflow

If possible, demonstrate speaking while visibly occupied with something physical. This is the use case most people haven't thought about and it's immediately legible.

The planning capability

This session itself is a good example: several hours of brainstorming, planning, and note-taking without touching a keyboard. The output (TODO.md, VOICES.md, CLEANUP-PLAN.md, WORKFLOWS.md, this file) is the artifact.

What makes a good moment to record

Natural silence after a query where the system is clearly thinking, then responds
A voice switch that lands well and sounds good
A query that gets the classifier right — fragment doesn't dispatch, full sentence does
An unexpected result from a voice clone that prompts a reaction
A task that was actually useful, not just a demo task

Things to prepare before a demo

Voice samples loaded and tested — all voices in voices.yaml checked for quality
TTS server running and warmed up (first response is slow)
At least one chime configured for acknowledgement
Silence timeout tuned so it doesn't misfire during normal speech pauses
A few interesting voice quotes ready to demonstrate character contrast

Purpose

This isn't a product pitch. It's sharing something genuinely useful that changes how work gets done — and the excitement that comes with that. The best demos come from someone showing something they actually use and care about. That energy is contagious and people respond to it differently than a polished technical showcase.

Framing

The system is not finished. That's fine to say. The interesting thing is that it's useful now in its current form, and the direction is clear. Showing the TODO and CLEANUP-PLAN alongside the working demo makes it credible — this is a real project, not a polished proof of concept that's going nowhere.

4.0 KiB Raw Blame History