WebSocket server, language/task args, verbose flag, misc improvements #2

Merged
mikael-lovqvist merged 13 commits from mikael-lovqvists-claude-agent/stt-server:websocket-server into main 2026-06-07 09:27:02 +00:00
Showing only changes of commit f2ba15185e - Show all commits

View File

@@ -24,9 +24,9 @@ Pass `--model <name>` to `stt-server.py`. Models are downloaded automatically fr
| Model | VRAM | Quality | Notes |
|-------|------|---------|-------|
| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). |
| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. |
| `medium.en` | ~5 GB | Good | Recommended starting point for production use. |
| `base.en` | ~0.5 GB (`float16`) / ~1 GB (`float32`) | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). |
| `small.en` | ~1 GB (`float16`) / ~2 GB (`float32`) | Medium | Noticeable improvement over base for most speech. |
| `medium.en` | ~2.5 GB (`float16`) / ~5 GB (`float32`) | Good | Recommended starting point for production use. |
| `large-v3` | ~5 GB (`float16`) / ~10 GB (`float32`) | Best | Highest accuracy, use if VRAM allows. |
English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech.