Files

shibao 0203858a6d initial commit

2026-03-05 07:33:04 +00:00

3.8 KiB

Raw Blame History

Narrate

A high-performance text-to-speech (TTS) narration tool using the Kokoro v1.0 ONNX model. This project is optimized for AMD GPUs using the MIGraphX execution provider but falls back to CPU when necessary.

Features

High Quality: Leverages the Kokoro v1.0 TTS model for natural-sounding speech.
Hardware Accelerated: Optimized for AMD GPUs via ONNX Runtime and MIGraphX.
Sentence-Level Buffering: Splits text into logical units (sentences/lines) to provide smooth, continuous playback without long initial wait times.
Asynchronous Playback: Uses a dedicated background thread for audio playback to ensure generation and playback happen in parallel.

Prerequisites

Hardware

Optimized for systems with AMD GPUs (uses MIGraphXExecutionProvider).
Works on CPU (fallback).

Software

Python: 3.14.3 (as specified in .tool-versions using asdf vm)
Node.js: 24.4.1 (as specified in .tool-versions using asdf vm)
Required Files: The following files must be present in the same directory:
- kokoro-v1.0.onnx: The ONNX model file.
- voices-v1.0.bin: The voice weights file.
- narrate.txt: The text file you want to narrate.

Models

This project requires the Kokoro v1.0 ONNX model and the corresponding voice binary. You can download them using the links below:

Kokoro v1.0 ONNX (FP16): kokoro-v1.0.fp16.onnx (169 MB)
Voice Weights: voices-v1.0.bin (26.9 MB)

Quick Download

wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin

Note: The script narrate.py expects the model file to be named kokoro-v1.0.onnx or kokoro-v1.0.fp16.onnx in the same directory.

Setup

1. Create the Virtual Environment

The project uses a virtual environment named kokoro-venv to manage its dependencies.

# Create the virtual environment using Python 3.14 (as per .tool-versions)
python3.14 -m venv kokoro-venv

# Activate the environment
source kokoro-venv/bin/activate

2. Install Dependencies

With the virtual environment activated, install the required Python packages:

pip install --upgrade pip
pip install -r requirements.txt

Note: For AMD GPU support (MIGraphX), ensure your environment has the necessary ROCm/MIGraphX libraries installed. The script will automatically fall back to the CPU if the GPU provider is unavailable.

3. Audio Requirements

On Linux, you may need to install the PortAudio development headers for sounddevice to work:

# For Ubuntu/Debian
sudo apt-get install libportaudio2

Model Files: Ensure you have downloaded the Kokoro ONNX model and voice binaries and placed them in the same folder.

Usage

Direct Python Execution

You can run the narration script directly:

python narrate.py

Using the Shell Script

A convenience script is provided to run the narrator using the local virtual environment:

./narrate.sh

Configuration

The script narrate.py contains several adjustable settings:

Voice: Defaulted to af_sky.
Speed: Set to 1.3x for faster narration.
Environment Variables:
- HSA_OVERRIDE_GFX_VERSION: Set to 10.3.0 for compatibility.
- MIGRAPHX_ENABLE_CACHE: Enabled to speed up subsequent loads.

File Structure

narrate.py: The core logic for TTS generation and audio playback.
narrate.sh: Entry point script.
.tool-versions: Version pinning for runtime environments.
kokoro-venv/: Local Python virtual environment containing dependencies.

3.8 KiB Raw Blame History