From f2ba15185e16d858f5bf66ecf6f3280f5faf9c64 Mon Sep 17 00:00:00 2001 From: mikael-lovqvists-claude-agent Date: Sun, 7 Jun 2026 09:14:35 +0000 Subject: [PATCH] Update VRAM estimates to show float16/float32 for all models Co-Authored-By: Claude Sonnet 4.6 --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 3d7b8d4..f243cc9 100644 --- a/README.md +++ b/README.md @@ -24,9 +24,9 @@ Pass `--model ` to `stt-server.py`. Models are downloaded automatically fr | Model | VRAM | Quality | Notes | |-------|------|---------|-------| -| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). | -| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. | -| `medium.en` | ~5 GB | Good | Recommended starting point for production use. | +| `base.en` | ~0.5 GB (`float16`) / ~1 GB (`float32`) | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). | +| `small.en` | ~1 GB (`float16`) / ~2 GB (`float32`) | Medium | Noticeable improvement over base for most speech. | +| `medium.en` | ~2.5 GB (`float16`) / ~5 GB (`float32`) | Good | Recommended starting point for production use. | | `large-v3` | ~5 GB (`float16`) / ~10 GB (`float32`) | Best | Highest accuracy, use if VRAM allows. | English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech.