mikael-lovqvists-claude-agent 20873be786 Add README, faster-whisper backend, and session fixes
- README explaining experimental/transparency purpose
- faster-whisper STT backend (fw-stt.mjs, faster-whisper-server.py, install-faster-whisper.sh)
- Bug fixes: Buffer alignment in on_audio, --debug-waveform URL parsing, silent fetch errors, instant dispatch timer leak
- Global uncaughtException/unhandledRejection handlers in query-demo.mjs
- Design docs: CHANGELOG, COMMAND-DISPATCH, INTERFACE-THEORY, VOICE-POLICY
- Systemd service unit templates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 06:39:14 +00:00

Voice Pipeline Experiment

This repository is an active experiment. It is published for transparency and reference — not as a finished or production-ready project. Expect rough edges, dead ends, work-in-progress notes, and design docs that describe things not yet built.

What this is

A local voice interface for Claude Code: speak a query, it transcribes, classifies intent, and dispatches to Claude. The stack runs entirely on local hardware — no cloud STT/TTS.

Core components:

File Role
query-demo.mjs Main entry point — mic → STT → query state machine → dispatch
lib/pending-query.mjs Query state machine: wake word, silence timer, send/cancel/pause
lib/stt.mjs Silero VAD + Whisper STT (sherpa-onnx backend)
lib/fw-stt.mjs faster-whisper STT backend with word-level timestamps
tts-server.mjs TTS HTTP server (Chatterbox model, voice switching)
lib/tts-client.mjs HTTP client for the TTS server
voices.yaml Named voice configuration
faster-whisper-server.py Python subprocess for faster-whisper transcription
install-faster-whisper.sh Builds ctranslate2 from source for CUDA 13 compatibility
download-models.sh Downloads Whisper and VAD models
query-demo.service / tts-server.service Systemd unit templates

Status

Working but experimental. The design docs (CLEANUP-PLAN.md, COMMAND-DISPATCH.md, etc.) describe architectural directions not yet implemented. The project will eventually split into separate focused repositories.

Offshoots

Links to derived, cleaner projects will be added here as they become ready.

Requirements

  • Node.js (ESM)
  • Python 3 with a venv (setup-venv.sh or install-faster-whisper.sh)
  • CUDA GPU (for faster-whisper backend)
  • PulseAudio or ALSA for mic capture
  • sherpa-onnx-node npm package
  • Chatterbox TTS model
Description
Voice pipeline experiment: local STT/TTS interface for Claude Code. Published for transparency — work in progress.
Readme 108 KiB
Languages
JavaScript 78.8%
Python 14.4%
Shell 6.8%