From be1efd9edb0bfb4015a2b32be12c700a3990e731 Mon Sep 17 00:00:00 2001 From: mikael-lovqvists-claude-agent Date: Sun, 7 Jun 2026 09:11:39 +0000 Subject: [PATCH] Add model selection and compute type sections to README Co-Authored-By: Claude Sonnet 4.6 --- README.md | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c28d7d6..63270fc 100644 --- a/README.md +++ b/README.md @@ -15,4 +15,31 @@ This project started as a [vibe-coded](https://en.wikipedia.org/wiki/Vibe_coding ### Setup [venv](https://docs.python.org/3/library/venv.html) for [python](https://www.python.org/) -We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented. \ No newline at end of file +We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented. + + +## Model selection + +Pass `--model ` to `stt-server.py`. Models are downloaded automatically from HuggingFace on first use. + +| Model | VRAM | Quality | Notes | +|-------|------|---------|-------| +| `base.en` | ~1 GB | Low | Default. Fast, but struggles with similar-sounding consonants (V/B/D). | +| `small.en` | ~2 GB | Medium | Noticeable improvement over base for most speech. | +| `medium.en` | ~5 GB | Good | Recommended starting point for production use. | +| `large-v3` | ~10 GB | Best | Highest accuracy, use if VRAM allows. | + +English-only models (`.en` suffix) are faster and more accurate than multilingual models for English speech. + + +## Compute type + +Pass `--compute-type ` to control the numeric precision used during inference. + +| Type | Notes | +|------|-------| +| `int8_float16` | Default. Good balance of speed and accuracy on modern GPUs. | +| `float16` | Slightly better accuracy, higher VRAM usage. | +| `int8` | CPU-friendly, lower quality. | + +If you see a CUDA error about mismatched library versions at startup, use `setup-venv-local-build.sh` to build ctranslate2 against your system CUDA version rather than using the PyPI wheel. \ No newline at end of file