Fix large-v3 VRAM estimate — ~5GB with float16, not ~10GB

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 09:14:00 +00:00
parent be1efd9edb
commit dd6e74a7a8

View File

@@ -27,7 +27,7 @@ Pass `--model <name>` to `stt-server.py`. Models are downloaded automatically fr
| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). | | `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). |
| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. | | `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. |
| `medium.en` | ~5 GB | Good | Recommended starting point for production use. | | `medium.en` | ~5 GB | Good | Recommended starting point for production use. |
| `large-v3` | ~10 GB | Best | Highest accuracy, use if VRAM allows. | | `large-v3` | ~5 GB (`float16`) / ~10 GB (`float32`) | Best | Highest accuracy, use if VRAM allows. |
English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech. English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech.