3.8 KiB
Narrate
A high-performance text-to-speech (TTS) narration tool using the Kokoro v1.0 ONNX model. This project is optimized for AMD GPUs using the MIGraphX execution provider but falls back to CPU when necessary.
Features
- High Quality: Leverages the Kokoro v1.0 TTS model for natural-sounding speech.
- Hardware Accelerated: Optimized for AMD GPUs via ONNX Runtime and MIGraphX.
- Sentence-Level Buffering: Splits text into logical units (sentences/lines) to provide smooth, continuous playback without long initial wait times.
- Asynchronous Playback: Uses a dedicated background thread for audio playback to ensure generation and playback happen in parallel.
Prerequisites
Hardware
- Optimized for systems with AMD GPUs (uses
MIGraphXExecutionProvider). - Works on CPU (fallback).
Software
- Python: 3.14.3 (as specified in
.tool-versionsusing asdf vm) - Node.js: 24.4.1 (as specified in
.tool-versionsusing asdf vm) - Required Files: The following files must be present in the same directory:
kokoro-v1.0.onnx: The ONNX model file.voices-v1.0.bin: The voice weights file.narrate.txt: The text file you want to narrate.
Models
This project requires the Kokoro v1.0 ONNX model and the corresponding voice binary. You can download them using the links below:
- Kokoro v1.0 ONNX (FP16): kokoro-v1.0.fp16.onnx (169 MB)
- Voice Weights: voices-v1.0.bin (26.9 MB)
Quick Download
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
Note: The script narrate.py expects the model file to be named kokoro-v1.0.onnx or kokoro-v1.0.fp16.onnx in the same directory.
Setup
1. Create the Virtual Environment
The project uses a virtual environment named kokoro-venv to manage its dependencies.
# Create the virtual environment using Python 3.14 (as per .tool-versions)
python3.14 -m venv kokoro-venv
# Activate the environment
source kokoro-venv/bin/activate
2. Install Dependencies
With the virtual environment activated, install the required Python packages:
pip install --upgrade pip
pip install -r requirements.txt
Note: For AMD GPU support (MIGraphX), ensure your environment has the necessary ROCm/MIGraphX libraries installed. The script will automatically fall back to the CPU if the GPU provider is unavailable.
3. Audio Requirements
On Linux, you may need to install the PortAudio development headers for sounddevice to work:
# For Ubuntu/Debian
sudo apt-get install libportaudio2
- Model Files: Ensure you have downloaded the Kokoro ONNX model and voice binaries and placed them in the same folder.
Usage
Direct Python Execution
You can run the narration script directly:
python narrate.py
Using the Shell Script
A convenience script is provided to run the narrator using the local virtual environment:
./narrate.sh
Configuration
The script narrate.py contains several adjustable settings:
- Voice: Defaulted to
af_sky. - Speed: Set to
1.3xfor faster narration. - Environment Variables:
HSA_OVERRIDE_GFX_VERSION: Set to10.3.0for compatibility.MIGRAPHX_ENABLE_CACHE: Enabled to speed up subsequent loads.
File Structure
narrate.py: The core logic for TTS generation and audio playback.narrate.sh: Entry point script..tool-versions: Version pinning for runtime environments.kokoro-venv/: Local Python virtual environment containing dependencies.