Add --language and --task CLI arguments, document in README

--language: force language detection (e.g. en, sv) or leave unset for auto
--task: transcribe (default) or translate to English
Previously language was hardcoded to 'en' which caused multilingual models
to hallucinate translations instead of transcribing the source language.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 09:16:19 +00:00
parent f2ba15185e
commit bdb1aac885
2 changed files with 22 additions and 2 deletions

View File

@@ -42,4 +42,21 @@ Pass `--compute-type <type>` to control the numeric precision used during infere
| `float16` | Slightly better accuracy, higher VRAM usage. | | `float16` | Slightly better accuracy, higher VRAM usage. |
| `int8` | CPU-friendly, lower quality. | | `int8` | CPU-friendly, lower quality. |
If you see a CUDA error about mismatched library versions at startup, use `setup-venv-local-build.sh` to build ctranslate2 against your system CUDA version rather than using the PyPI wheel. If you see a CUDA error about mismatched library versions at startup, use `setup-venv-local-build.sh` to build ctranslate2 against your system CUDA version rather than using the PyPI wheel.
## Language and translation
By default the server auto-detects the spoken language and transcribes it as-is.
| Argument | Default | Notes |
|----------|---------|-------|
| `--language <code>` | none (auto-detect) | Force a specific language, e.g. `--language en` or `--language sv`. Speeds up detection and avoids misidentification. |
| `--task transcribe` | default | Output text in the spoken language. |
| `--task translate` | | Translate speech to English regardless of source language. |
> [!NOTE]
> The `.en` model variants (`base.en`, `small.en` etc.) are English-only and do not support `--task translate` or non-English `--language`. Use a multilingual model (`large-v3`, `medium`) for multilingual or translation use cases.
> [!WARNING]
> Omitting `--language` with a multilingual model and English-only speech may cause occasional misdetection. Pass `--language en` to avoid this if you only speak English.

View File

@@ -120,6 +120,8 @@ parser = argparse.ArgumentParser()
parser.add_argument('--model', default='base.en') parser.add_argument('--model', default='base.en')
parser.add_argument('--device', default='cuda') parser.add_argument('--device', default='cuda')
parser.add_argument('--compute-type', default='int8_float16') parser.add_argument('--compute-type', default='int8_float16')
parser.add_argument('--language', default=None, help='language code (e.g. en, sv) or None for auto-detect')
parser.add_argument('--task', default='transcribe', choices=['transcribe', 'translate'], help='transcribe keeps the source language; translate converts to English')
parser.add_argument('--verbose', '-v', action='store_true') parser.add_argument('--verbose', '-v', action='store_true')
args = parser.parse_args() args = parser.parse_args()
verbose = args.verbose verbose = args.verbose
@@ -195,7 +197,8 @@ def transcription_worker():
try: try:
segments, _ = model.transcribe( segments, _ = model.transcribe(
samples, samples,
language='en', language=args.language,
task=args.task,
word_timestamps=True, word_timestamps=True,
vad_filter=False, vad_filter=False,
) )