STT (Silero VAD + Whisper via sherpa-onnx), Chatterbox TTS HTTP server, query completeness classifier (Ollama), multi-voice demo scripts, and planning docs. Kept as reference; clean rewrite planned in separate repos. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.8 KiB
Voice Project — Cleanup Plan
Note
Revised approach — keep this project as-is (it works and serves as a reference). The path forward is new clean repos for each component, then a top-level project that composes them. See New architecture section below.
The project has accumulated experiment scripts, dead TTS backends, and outdated setup files. This document is the plan before doing the actual cleanup.
Current state
Active pipeline (keep)
| File | Role |
|---|---|
query-demo.mjs |
Main entry point — STT → classifier → dispatch |
tts-server.mjs |
TTS HTTP server (Chatterbox, voice switching) |
voices.yaml |
Named voice config |
lib/stt.mjs |
Silero VAD + Whisper, pre-roll buffer |
lib/chatterbox-tts.mjs |
Node wrapper for chatterbox-server.py |
lib/tts-client.mjs |
HTTP client for tts-server.mjs |
lib/local-query-complete.mjs |
Ollama classifier |
chatterbox-server.py |
Long-running Python TTS process |
listen.mjs |
Standalone mic → stdout tool |
speak-as.mjs |
Standalone stdin → TTS tool |
demos/ |
Multi-voice demo scripts |
download-models.sh |
Whisper + VAD model download |
setup-venv.sh |
Python venv setup (needs rewrite — see below) |
models/ |
Downloaded STT models |
venv/ |
Python venv (not committed) |
Dead / experimental (remove or archive)
| File | Reason |
|---|---|
acting-demo-bark.mjs |
Bark TTS — abandoned |
acting-demo-chatterbox.mjs |
One-off demo, superseded |
acting-demo.mjs |
Old demo |
bark-server.py |
Bark backend — abandoned |
demo-bark.mjs |
Bark demo |
demo-kokoro.mjs |
Kokoro TTS experiment — abandoned |
lib/bark-tts.mjs |
Bark wrapper |
lib/tts.mjs |
Old TTS abstraction (superseded by tts-client.mjs) |
lib/markdown.mjs |
Unused? Verify before deleting |
lib/llm.mjs |
Unused? Verify before deleting |
requirements.txt |
Lists kokoro + faster-whisper deps that aren't used |
voice-buddy.mjs |
Merged into query-demo.mjs — delete |
What needs to be documented / fixed
1. setup-venv.sh — rewrite
Current script installs Bark deps. The active backend is Chatterbox. New script should:
- Install
chatterbox-ttsand its deps (torch, transformers, accelerate, numpy) - HuggingFace token instruction: write token to
~/.secrets/hugging-face.token - Note:
find_hf_cache()in chatterbox-server.py avoids HF network calls if the model is cached
2. requirements.txt — replace or delete
Currently lists kokoro and faster-whisper (unused). Either:
- Replace with
chatterbox-requirements.txtlisting only whatsetup-venv.shinstalls - Or just delete it —
setup-venv.shis the source of truth
3. sherpa-onnx / CUDA build — document clearly
The npm package ships a CPU-only .so. Using CUDA requires a manual rebuild. This is documented in NOTES.md but should also appear in a top-level README.md or SETUP.md so it's hard to miss.
4. System dependencies — list them
Nothing currently lists what needs to be installed at the OS level:
parec/pacat(PulseAudio) — audio I/Ocmake— for optional CUDA sherpa-onnx rebuildonnxruntime-opt-cuda(or equivalent) — for GPU STTollamarunning locally withqwen2.5:3bpulled — for classifierjq— used in shell functions and demo scripts- Node.js (version? check what's required)
- Python 3 with venv support
5. External services — document
- TTS server: runs on the host (not in container), port 11500
- Ollama: runs on
192.168.2.99:11434— classifier and future LLM routing - TTS_URL env var: set in
~/.bashrcto point at TTS server
6. voices.yaml — keep and document
Already the right format. Add a comment block at the top explaining how to add a voice (grab a clean WAV/OGG clip, at least 5 seconds, no background music).
Proposed new top-level structure
voice/
README.md ← start here: what this is, quick start
SETUP.md ← full setup: OS deps, npm install, venv, CUDA, models
NOTES.md ← architecture deep-dives (keep as-is)
TODO.md ← keep as-is
VOICES.md ← wishlist (keep as-is)
voices.yaml ← runtime voice config
query-demo.mjs ← main entry point
tts-server.mjs ← TTS HTTP server
listen.mjs ← standalone STT tool
speak-as.mjs ← standalone TTS tool
chatterbox-server.py
setup-venv.sh ← rewritten for Chatterbox only
download-models.sh
package.json
lib/
stt.mjs
chatterbox-tts.mjs
tts-client.mjs
local-query-complete.mjs
demos/
*.sh
models/ ← gitignored
venv/ ← gitignored
Cleanup steps (in order)
- Verify
lib/markdown.mjsandlib/llm.mjsare unused — grep for imports - Delete dead files: bark/kokoro scripts, old demos, voice-buddy.mjs, lib/bark-tts.mjs, lib/tts.mjs, and unused lib files
- Rewrite
setup-venv.shfor Chatterbox only - Replace
requirements.txtwith accurate one or delete - Write
SETUP.mdcovering all install steps end-to-end - Write
README.mdwith quick start - Add voice cloning tips to top of
voices.yaml - Commit the whole cleanup as a single commit
Open questions
- Keep
acting-demo-chatterbox.mjsas a reference for TTS capability demos, or delete? - Should
demos/scripts be committed as-is or made more portable (no hardcoded paths)?
New architecture (revised plan)
Keep this project as a reference. Build clean standalone repos instead:
| Repo | Contents | Depends on |
|---|---|---|
tts-server |
chatterbox-server.py, tts-server.mjs, lib/chatterbox-tts.mjs, voices.yaml format, HTTP API | Python venv, Chatterbox |
stt |
lib/stt.mjs, sherpa-onnx, VAD + Whisper, pre-roll buffer, listen.mjs | sherpa-onnx-node, models |
voice-assistant |
query-demo.mjs, lib/tts-client.mjs, lib/local-query-complete.mjs | tts-server + stt as submodules |
Each standalone repo gets its own README, SETUP, and can be used independently. The voice-assistant repo composes them via git submodules (or npm workspace / symlinks — TBD).
Open questions for new architecture
- Composition: npm packages — each repo published to the private npm.efforting.tech registry; voice-assistant depends on
@efforting/tts-serverand@efforting/sttas npm deps. Cleaner than submodules for ESM projects. - Model storage — models are large, have different backup strategies, and should NOT live inside project directories. There are canonical model locations on the system; repos should reference them via an env var (e.g.
MODELS_DIRor per-model env vars) rather than downloading into the project. Migrate existingmodels/directories out as part of the new repo setup. Not urgent — note for cleanup time.