WebSocket server, language/task args, verbose flag, misc improvements #2

mikael-lovqvists-claude-agent · 2026-06-07T09:26:47Z

mikael-lovqvists-claude-agent commented

2026-06-07 09:26:47 +00:00

Summary

WebSocket server broadcasting all events to every connected client — no subscription handshake required (STT_PORT env var, default 11501)
Remove stdout event output — WebSocket is the sole event channel
--verbose / -v flag — info logging off by default for clean journal output when running as a service; errors always go to stderr
--language — force language detection or leave unset for auto-detect
--task transcribe|translate — transcribe keeps source language, translate converts to English
Detected language and confidence included in transcript events (language, language_probability)
Load HuggingFace token from ~/.secrets/hugging-face.token (consistent with tts-server, overridable via HF_TOKEN_FILE)
Log to stderr when model needs to be downloaded
examples/listen.mjs and examples/transcripts.mjs using Node.js built-in WebSocket
NOTES.md with unused TranscriptionInfo fields for future reference
README: model selection table with VRAM estimates, compute type guide, language/translation docs
Both setup scripts: add torch, silero-vad, websockets; fix missing faster-whisper deps in local-build path

Test plan

./stt-server.py --model large-v3 --compute-type float16 --verbose starts and transcribes
node examples/listen.mjs receives events
node examples/transcripts.mjs prints transcript text
Silent by default (no stderr noise when running as service)

## Summary - WebSocket server broadcasting all events to every connected client — no subscription handshake required (`STT_PORT` env var, default 11501) - Remove stdout event output — WebSocket is the sole event channel - `--verbose` / `-v` flag — info logging off by default for clean journal output when running as a service; errors always go to stderr - `--language` — force language detection or leave unset for auto-detect - `--task transcribe|translate` — transcribe keeps source language, translate converts to English - Detected language and confidence included in transcript events (`language`, `language_probability`) - Load HuggingFace token from `~/.secrets/hugging-face.token` (consistent with tts-server, overridable via `HF_TOKEN_FILE`) - Log to stderr when model needs to be downloaded - `examples/listen.mjs` and `examples/transcripts.mjs` using Node.js built-in WebSocket - `NOTES.md` with unused `TranscriptionInfo` fields for future reference - README: model selection table with VRAM estimates, compute type guide, language/translation docs - Both setup scripts: add `torch`, `silero-vad`, `websockets`; fix missing faster-whisper deps in local-build path ## Test plan - [x] `./stt-server.py --model large-v3 --compute-type float16 --verbose` starts and transcribes - [x] `node examples/listen.mjs` receives events - [x] `node examples/transcripts.mjs` prints transcript text - [x] Silent by default (no stderr noise when running as service)

mikael-lovqvists-claude-agent added 13 commits 2026-06-07 09:26:47 +00:00

Add WebSocket broadcast to stt-server.py 18404708e3

Every connection receives the full event stream (vad_start, vad_end,
transcript, error) from the moment it connects — no subscription
handshake required. The asyncio WebSocket server runs in a daemon thread
alongside the VAD loop and transcription thread. Events still go to
stdout unchanged.

Port is configurable via STT_PORT env var (default: 11501).
Add websockets to both setup scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add --verbose flag; suppress info logging by default f18330608d

Errors always go to stderr. Info logs (startup, VAD events, transcripts)
only appear with --verbose / -v, keeping stderr clean when running as a
system service.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove stdout event output — WebSocket is the sole event channel aad1bda3bf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add Node.js WebSocket example scripts 6bbc04dde7

listen.mjs: prints all events as JSON objects.
transcripts.mjs: prints transcript text only.
Both use Node 21+ built-in WebSocket — no libraries required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Log to stderr when model needs to be downloaded 218687b039

Checks cache first with local_files_only=True; if the model isn't present
logs "downloading model ..." to stderr before WhisperModel triggers the
actual download.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Gate download log message behind --verbose like everything else 7b03deddb5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Load HF_TOKEN from token file at startup (consistent with tts-server) 9030b1315d

Reads ~/.secrets/hugging-face.token by default, overridable via HF_TOKEN_FILE.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add model selection and compute type sections to README be1efd9edb

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix large-v3 VRAM estimate — ~5GB with float16, not ~10GB dd6e74a7a8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update VRAM estimates to show float16/float32 for all models f2ba15185e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add --language and --task CLI arguments, document in README bdb1aac885

--language: force language detection (e.g. en, sv) or leave unset for auto
--task: transcribe (default) or translate to English
Previously language was hardcoded to 'en' which caused multilingual models
to hallucinate translations instead of transcribing the source language.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Include detected language and confidence in transcript events 0afe761625

Unpacks transcription info instead of discarding it. Adds language and
language_probability fields to transcript events, and includes them in
verbose log output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add NOTES.md with TranscriptionInfo unused fields 81e9ea82cf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mikael-lovqvist merged commit 4f983ca276 into main

2026-06-07 09:27:02 +00:00

mikael-lovqvist referenced this issue from a commit

2026-06-07 09:27:03 +00:00

Merge pull request 'WebSocket server, language/task args, verbose flag, misc improvements' (#2) from mikael-lovqvists-claude-agent/stt-server:websocket-server into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: efforting.tech/stt-server#2

Allow edits from maintainers