From dd6e74a7a895c09d02a43e56447212c7fdd82d95 Mon Sep 17 00:00:00 2001 From: mikael-lovqvists-claude-agent Date: Sun, 7 Jun 2026 09:14:00 +0000 Subject: [PATCH] =?UTF-8?q?Fix=20large-v3=20VRAM=20estimate=20=E2=80=94=20?= =?UTF-8?q?~5GB=20with=20float16,=20not=20~10GB?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Sonnet 4.6 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 63270fc..3d7b8d4 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ Pass `--model ` to `stt-server.py`. Models are downloaded automatically fr | `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). | | `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. | | `medium.en` | ~5 GB | Good | Recommended starting point for production use. | -| `large-v3` | ~10 GB | Best | Highest accuracy, use if VRAM allows. | +| `large-v3` | ~5 GB (`float16`) / ~10 GB (`float32`) | Best | Highest accuracy, use if VRAM allows. | English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech.