Text to voice interface

Overview

This project aims to provide text to voice with voice cloning ability. It is using chatterbox as backend.

Origin

This project started as a vibe-coded experiment but this version is somewhat more hands on.

Running

The quickest way to test this is to setup according to the instructions below and then use the example scripts under examples/.

Setup

Setup venv for python

Run setup-venv.sh.

Note

The default location is a directory called venv that is created next to the script, but you can override it by using the environment variable PYTHON_ENV to point to a different location.

PYTHON_ENV='/some/path' ./setup-venv.sh

Environment

Variable Purpose
HF_TOKEN_FILE Used to resolve a file for the HF_TOKEN secret that is used to download models from Hugging Face. If it is not set it defaults to ~/.secrets/hugging-face.token.
HF_HUB_CACHE Location for hugging face model cache, defaults to ~/.cache/huggingface/hub.
Description
Basic TTS server based on [chatterbox-tts](https://github.com/resemble-ai/chatterbox)
Readme 49 KiB
Languages
Python 95.5%
Shell 4.5%