Add model selection and compute type sections to README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 09:11:39 +00:00
parent 9030b1315d
commit be1efd9edb
1 changed files with 28 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -16,3 +16,30 @@ This project started as a [vibe-coded](https://en.wikipedia.org/wiki/Vibe_coding
 ### Setup [venv](https://docs.python.org/3/library/venv.html) for [python](https://www.python.org/)

 We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented.
+
+
+## Model selection
+
+Pass `--model <name>` to `stt-server.py`. Models are downloaded automatically from HuggingFace on first use.
+
+| Model | VRAM | Quality | Notes |
+|-------|------|---------|-------|
+| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). |
+| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. |
+| `medium.en` | ~5 GB | Good | Recommended starting point for production use. |
+| `large-v3` | ~10 GB | Best | Highest accuracy, use if VRAM allows. |
+
+English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech.
+
+
+## Compute type
+
+Pass `--compute-type <type>` to control the numeric precision used during inference.
+
+| Type | Notes |
+|------|-------|
+| `int8_float16` | Default. Good balance of speed and accuracy on modern GPUs. |
+| `float16` | Slightly better accuracy, higher VRAM usage. |
+| `int8` | CPU-friendly, lower quality. |
+
+If you see a CUDA error about mismatched library versions at startup, use `setup-venv-local-build.sh` to build ctranslate2 against your system CUDA version rather than using the PyPI wheel.