efforting.tech/stt-server

WebSocket server, language/task args, verbose flag, misc improvements #2

Merged

mikael-lovqvist merged 13 commits from mikael-lovqvists-claude-agent/stt-server:websocket-server into main

2026-06-07 09:27:02 +00:00

Author	SHA1	Message	Date
mikael-lovqvists-claude-agent	81e9ea82cf	Add NOTES.md with TranscriptionInfo unused fields Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:24:53 +00:00
mikael-lovqvists-claude-agent	0afe761625	Include detected language and confidence in transcript events Unpacks transcription info instead of discarding it. Adds language and language_probability fields to transcript events, and includes them in verbose log output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:21:44 +00:00
mikael-lovqvists-claude-agent	bdb1aac885	Add --language and --task CLI arguments, document in README --language: force language detection (e.g. en, sv) or leave unset for auto --task: transcribe (default) or translate to English Previously language was hardcoded to 'en' which caused multilingual models to hallucinate translations instead of transcribing the source language. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:16:19 +00:00
mikael-lovqvists-claude-agent	f2ba15185e	Update VRAM estimates to show float16/float32 for all models Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:14:35 +00:00
mikael-lovqvists-claude-agent	dd6e74a7a8	Fix large-v3 VRAM estimate — ~5GB with float16, not ~10GB Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:14:00 +00:00
mikael-lovqvists-claude-agent	be1efd9edb	Add model selection and compute type sections to README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:11:39 +00:00
mikael-lovqvists-claude-agent	9030b1315d	Load HF_TOKEN from token file at startup (consistent with tts-server) Reads ~/.secrets/hugging-face.token by default, overridable via HF_TOKEN_FILE. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:10:54 +00:00
mikael-lovqvists-claude-agent	7b03deddb5	Gate download log message behind --verbose like everything else Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:09:03 +00:00
mikael-lovqvists-claude-agent	218687b039	Log to stderr when model needs to be downloaded Checks cache first with local_files_only=True; if the model isn't present logs "downloading model ..." to stderr before WhisperModel triggers the actual download. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 09:07:13 +00:00
mikael-lovqvists-claude-agent	6bbc04dde7	Add Node.js WebSocket example scripts listen.mjs: prints all events as JSON objects. transcripts.mjs: prints transcript text only. Both use Node 21+ built-in WebSocket — no libraries required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 08:59:38 +00:00
mikael-lovqvists-claude-agent	aad1bda3bf	Remove stdout event output — WebSocket is the sole event channel Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 08:58:29 +00:00
mikael-lovqvists-claude-agent	f18330608d	Add --verbose flag; suppress info logging by default Errors always go to stderr. Info logs (startup, VAD events, transcripts) only appear with --verbose / -v, keeping stderr clean when running as a system service. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 08:56:58 +00:00
mikael-lovqvists-claude-agent	18404708e3	Add WebSocket broadcast to stt-server.py Every connection receives the full event stream (vad_start, vad_end, transcript, error) from the moment it connects — no subscription handshake required. The asyncio WebSocket server runs in a daemon thread alongside the VAD loop and transcription thread. Events still go to stdout unchanged. Port is configurable via STT_PORT env var (default: 11501). Add websockets to both setup scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-07 08:53:54 +00:00