Unpacks transcription info instead of discarding it. Adds language and
language_probability fields to transcript events, and includes them in
verbose log output.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--language: force language detection (e.g. en, sv) or leave unset for auto
--task: transcribe (default) or translate to English
Previously language was hardcoded to 'en' which caused multilingual models
to hallucinate translations instead of transcribing the source language.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Checks cache first with local_files_only=True; if the model isn't present
logs "downloading model ..." to stderr before WhisperModel triggers the
actual download.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
listen.mjs: prints all events as JSON objects.
transcripts.mjs: prints transcript text only.
Both use Node 21+ built-in WebSocket — no libraries required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Errors always go to stderr. Info logs (startup, VAD events, transcripts)
only appear with --verbose / -v, keeping stderr clean when running as a
system service.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Every connection receives the full event stream (vad_start, vad_end,
transcript, error) from the moment it connects — no subscription
handshake required. The asyncio WebSocket server runs in a daemon thread
alongside the VAD loop and transcription thread. Events still go to
stdout unchanged.
Port is configurable via STT_PORT env var (default: 11501).
Add websockets to both setup scripts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>