Commit Graph

24 Commits

Author SHA1 Message Date
4f983ca276 Merge pull request 'WebSocket server, language/task args, verbose flag, misc improvements' (#2) from mikael-lovqvists-claude-agent/stt-server:websocket-server into main
Reviewed-on: #2
2026-06-07 09:27:01 +00:00
81e9ea82cf Add NOTES.md with TranscriptionInfo unused fields
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:24:53 +00:00
0afe761625 Include detected language and confidence in transcript events
Unpacks transcription info instead of discarding it. Adds language and
language_probability fields to transcript events, and includes them in
verbose log output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:21:44 +00:00
bdb1aac885 Add --language and --task CLI arguments, document in README
--language: force language detection (e.g. en, sv) or leave unset for auto
--task: transcribe (default) or translate to English
Previously language was hardcoded to 'en' which caused multilingual models
to hallucinate translations instead of transcribing the source language.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:16:19 +00:00
f2ba15185e Update VRAM estimates to show float16/float32 for all models
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:14:35 +00:00
dd6e74a7a8 Fix large-v3 VRAM estimate — ~5GB with float16, not ~10GB
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:14:00 +00:00
be1efd9edb Add model selection and compute type sections to README
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:11:39 +00:00
9030b1315d Load HF_TOKEN from token file at startup (consistent with tts-server)
Reads ~/.secrets/hugging-face.token by default, overridable via HF_TOKEN_FILE.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:10:54 +00:00
7b03deddb5 Gate download log message behind --verbose like everything else
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:09:03 +00:00
218687b039 Log to stderr when model needs to be downloaded
Checks cache first with local_files_only=True; if the model isn't present
logs "downloading model ..." to stderr before WhisperModel triggers the
actual download.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:07:13 +00:00
6bbc04dde7 Add Node.js WebSocket example scripts
listen.mjs: prints all events as JSON objects.
transcripts.mjs: prints transcript text only.
Both use Node 21+ built-in WebSocket — no libraries required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:59:38 +00:00
aad1bda3bf Remove stdout event output — WebSocket is the sole event channel
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:58:29 +00:00
f18330608d Add --verbose flag; suppress info logging by default
Errors always go to stderr. Info logs (startup, VAD events, transcripts)
only appear with --verbose / -v, keeping stderr clean when running as a
system service.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:56:58 +00:00
18404708e3 Add WebSocket broadcast to stt-server.py
Every connection receives the full event stream (vad_start, vad_end,
transcript, error) from the moment it connects — no subscription
handshake required. The asyncio WebSocket server runs in a daemon thread
alongside the VAD loop and transcription thread. Events still go to
stdout unchanged.

Port is configurable via STT_PORT env var (default: 11501).
Add websockets to both setup scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:53:54 +00:00
a4fe95b24a Merge pull request 'Add stt-server.py and fix setup scripts' (#1) from mikael-lovqvists-claude-agent/stt-server:stt-process into main
Reviewed-on: #1
2026-06-07 08:51:22 +00:00
01210e878f Add silero-vad to both venv setup scripts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:48:59 +00:00
c0a72679f8 Add torch to both venv setup scripts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:47:03 +00:00
2af47373c4 Add stt-server.py: self-contained recording + VAD + transcription process
Replaces the old stdin/stdout transcription-only server. Now handles the
full pipeline in Python:
- Launches parec or arecord for mic capture
- Runs Silero VAD (via silero-vad, already a faster-whisper dep — no sherpa-onnx needed)
- Pre-roll ring buffer (0.2s) prepended to each segment for context
- Transcribes with faster-whisper in a separate thread (GPU not blocking VAD)
- Emits JSON line events to stdout: ready, vad_start, vad_end, transcript, error

Event protocol is designed to map directly to WebSocket subscriptions later.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:41:45 +00:00
bbde89a2cc Fix missing faster-whisper deps when using local ctranslate2 build
--no-deps skipped av and other required packages. Fix by installing
faster-whisper normally first (satisfies all deps, pulls PyPI ctranslate2),
then immediately overriding ctranslate2 with the source-built version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:28:42 +00:00
346c7c6585 Remove Arch Linux specific package suggestions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:24:43 +00:00
3db7058646 Add setup-venv.sh, clean up setup-venv-local-build.sh
setup-venv.sh: simple PyPI install path — just pip install faster-whisper.
Use this when the PyPI ctranslate2 wheel matches the system CUDA version.

setup-venv-local-build.sh:
- PYTHON_ENV env var for venv path override (consistent with tts-server)
- HF_TOKEN_FILE env var instead of hardcoded path
- HF_HUB_CACHE env var surfaced in output when set
- Remove stray chmod on faster-whisper-server.py (not part of this repo)
- Remove voice-experiment-specific "run with" message
- Add python3 to tool prerequisite check
- Arch Linux package suggestions extended to cover CUDA and python
- Document why each script exists and when to use which

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:22:40 +00:00
c96bf0ecf5 Updated readme 2026-06-07 10:17:43 +02:00
0b9f9121bd Added gitignore 2026-06-07 10:15:22 +02:00
4fa4baee17 Initial commit 2026-06-07 10:14:56 +02:00