Add model selection and compute type sections to README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 09:11:39 +00:00
parent 9030b1315d
commit be1efd9edb

View File

@@ -16,3 +16,30 @@ This project started as a [vibe-coded](https://en.wikipedia.org/wiki/Vibe_coding
### Setup [venv](https://docs.python.org/3/library/venv.html) for [python](https://www.python.org/)
We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented.
## Model selection
Pass `--model <name>` to `stt-server.py`. Models are downloaded automatically from HuggingFace on first use.
| Model | VRAM | Quality | Notes |
|-------|------|---------|-------|
| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). |
| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. |
| `medium.en` | ~5 GB | Good | Recommended starting point for production use. |
| `large-v3` | ~10 GB | Best | Highest accuracy, use if VRAM allows. |
English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech.
## Compute type
Pass `--compute-type <type>` to control the numeric precision used during inference.
| Type | Notes |
|------|-------|
| `int8_float16` | Default. Good balance of speed and accuracy on modern GPUs. |
| `float16` | Slightly better accuracy, higher VRAM usage. |
| `int8` | CPU-friendly, lower quality. |
If you see a CUDA error about mismatched library versions at startup, use `setup-venv-local-build.sh` to build ctranslate2 against your system CUDA version rather than using the PyPI wheel.