Files
claude-voice-experiment/VOICES.md
mikael-lovqvists-claude-agent db8889aeed Initial commit — voice pipeline experiment
STT (Silero VAD + Whisper via sherpa-onnx), Chatterbox TTS HTTP server,
query completeness classifier (Ollama), multi-voice demo scripts, and
planning docs. Kept as reference; clean rewrite planned in separate repos.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 04:48:54 +00:00

1.9 KiB

Voice Clone Wishlist

Voices to prepare reference clips for.

Ready

Voice Character / Show Notes
Rommie Andromeda — ship AI / android /home/devilholk/Documents/rommie-sample.wav — working well
Wilford Brimley (Harold W. Smith) Remo Williams: The Adventure Begins (1985) — head of CURE Turned out well — gravelly authoritative delivery, dry clean speech

To Do

Voice Character / Show Notes
Fred Ward Remo Williams: The Adventure Begins (1985) — Remo Williams himself Distinctive gravelly voice, same film as Brimley so similar source quality
Alan Scarfe TNG (Romulan), Andromeda S5 (Flavin) Deep, authoritative voice — hunt for quiet scenes without ship hum
John Fleck Enterprise (Silik the Suliban) Distinctive raspy voice — Suliban scenes may have atmosphere noise
Steve Bacic Andromeda (Telemachus Rhade)
Alex Diakun Andromeda (Perseid character, name ~"Atune"), Stargate SG-1 (science role) Prolific Vancouver sci-fi character actor, often plays scientists/scholars — distinctive voice
Unknown actress Andromeda S4/S5 (Dylan's love interest), Babylon 5 (Mars rebellion leader) Possibly Marjorie Monaghan (Number One in B5) — unconfirmed
Claudia Christian Babylon 5 — Commander Ivanova User already has a voice clip ready

Notes

  • Animated/cartoon voices (e.g. Darkwing Duck) don't clone well — too far outside natural human speech distribution
  • Compressed/heavily post-processed audio (spaceship hum, background score) degrades results even after noise reduction
  • OGG vs WAV quality difference is likely source quality, not encoding — soundfile handles both
  • Voice cloning quality scales with both clip length and emotional range — varied prosody (questions, statements, different tones) gives the model more to anchor on than flat monotone