diff --git a/README.md b/README.md index c28d7d6..63270fc 100644 --- a/README.md +++ b/README.md @@ -15,4 +15,31 @@ This project started as a [vibe-coded](https://en.wikipedia.org/wiki/Vibe_coding ### Setup [venv](https://docs.python.org/3/library/venv.html) for [python](https://www.python.org/) -We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented. \ No newline at end of file +We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented. + + +## Model selection + +Pass `--model ` to `stt-server.py`. Models are downloaded automatically from HuggingFace on first use. + +| Model | VRAM | Quality | Notes | +|-------|------|---------|-------| +| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). | +| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. | +| `medium.en` | ~5 GB | Good | Recommended starting point for production use. | +| `large-v3` | ~10 GB | Best | Highest accuracy, use if VRAM allows. | + +English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech. + + +## Compute type + +Pass `--compute-type ` to control the numeric precision used during inference. + +| Type | Notes | +|------|-------| +| `int8_float16` | Default. Good balance of speed and accuracy on modern GPUs. | +| `float16` | Slightly better accuracy, higher VRAM usage. | +| `int8` | CPU-friendly, lower quality. | + +If you see a CUDA error about mismatched library versions at startup, use `setup-venv-local-build.sh` to build ctranslate2 against your system CUDA version rather than using the PyPI wheel. \ No newline at end of file