Add README, faster-whisper backend, and session fixes
- README explaining experimental/transparency purpose - faster-whisper STT backend (fw-stt.mjs, faster-whisper-server.py, install-faster-whisper.sh) - Bug fixes: Buffer alignment in on_audio, --debug-waveform URL parsing, silent fetch errors, instant dispatch timer leak - Global uncaughtException/unhandledRejection handlers in query-demo.mjs - Design docs: CHANGELOG, COMMAND-DISPATCH, INTERFACE-THEORY, VOICE-POLICY - Systemd service unit templates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
40
README.md
Normal file
40
README.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Voice Pipeline Experiment
|
||||
|
||||
This repository is an active experiment. It is published for **transparency and reference** — not as a finished or production-ready project. Expect rough edges, dead ends, work-in-progress notes, and design docs that describe things not yet built.
|
||||
|
||||
## What this is
|
||||
|
||||
A local voice interface for [Claude Code](https://github.com/anthropics/claude-code): speak a query, it transcribes, classifies intent, and dispatches to Claude. The stack runs entirely on local hardware — no cloud STT/TTS.
|
||||
|
||||
Core components:
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `query-demo.mjs` | Main entry point — mic → STT → query state machine → dispatch |
|
||||
| `lib/pending-query.mjs` | Query state machine: wake word, silence timer, send/cancel/pause |
|
||||
| `lib/stt.mjs` | Silero VAD + Whisper STT (sherpa-onnx backend) |
|
||||
| `lib/fw-stt.mjs` | faster-whisper STT backend with word-level timestamps |
|
||||
| `tts-server.mjs` | TTS HTTP server (Chatterbox model, voice switching) |
|
||||
| `lib/tts-client.mjs` | HTTP client for the TTS server |
|
||||
| `voices.yaml` | Named voice configuration |
|
||||
| `faster-whisper-server.py` | Python subprocess for faster-whisper transcription |
|
||||
| `install-faster-whisper.sh` | Builds ctranslate2 from source for CUDA 13 compatibility |
|
||||
| `download-models.sh` | Downloads Whisper and VAD models |
|
||||
| `query-demo.service` / `tts-server.service` | Systemd unit templates |
|
||||
|
||||
## Status
|
||||
|
||||
Working but experimental. The design docs (`CLEANUP-PLAN.md`, `COMMAND-DISPATCH.md`, etc.) describe architectural directions not yet implemented. The project will eventually split into separate focused repositories.
|
||||
|
||||
## Offshoots
|
||||
|
||||
Links to derived, cleaner projects will be added here as they become ready.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Node.js (ESM)
|
||||
- Python 3 with a venv (`setup-venv.sh` or `install-faster-whisper.sh`)
|
||||
- CUDA GPU (for faster-whisper backend)
|
||||
- PulseAudio or ALSA for mic capture
|
||||
- `sherpa-onnx-node` npm package
|
||||
- Chatterbox TTS model
|
||||
Reference in New Issue
Block a user