mikael-lovqvists-claude-agent 2af47373c4 Add stt-server.py: self-contained recording + VAD + transcription process
Replaces the old stdin/stdout transcription-only server. Now handles the
full pipeline in Python:
- Launches parec or arecord for mic capture
- Runs Silero VAD (via silero-vad, already a faster-whisper dep — no sherpa-onnx needed)
- Pre-roll ring buffer (0.2s) prepended to each segment for context
- Transcribes with faster-whisper in a separate thread (GPU not blocking VAD)
- Emits JSON line events to stdout: ready, vad_start, vad_end, transcript, error

Event protocol is designed to map directly to WebSocket subscriptions later.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-07 08:41:45 +00:00
2026-06-07 10:17:43 +02:00

Voice to text interface

Overview

This project aims to provide voice to text using faster-whisper as backend.

Origin

This project started as a vibe-coded experiment but this version is somewhat more hands on.

Setup

Setup venv for python

We will have two different setups here depending on if you want to build ctranslate2 locally or not. This shall be documented.

Description
Voice to text server using [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
Readme 92 KiB
Languages
Shell 51.6%
Python 48.4%