# Narrate A high-performance text-to-speech (TTS) narration tool using the Kokoro v1.0 ONNX model. This project is optimized for AMD GPUs using the `MIGraphX` execution provider but falls back to CPU when necessary. ## Features - **High Quality**: Leverages the Kokoro v1.0 TTS model for natural-sounding speech. - **Hardware Accelerated**: Optimized for AMD GPUs via ONNX Runtime and MIGraphX. - **Sentence-Level Buffering**: Splits text into logical units (sentences/lines) to provide smooth, continuous playback without long initial wait times. - **Asynchronous Playback**: Uses a dedicated background thread for audio playback to ensure generation and playback happen in parallel. ## Prerequisites ### Hardware - Optimized for systems with AMD GPUs (uses `MIGraphXExecutionProvider`). - Works on CPU (fallback). ### Software - **Python**: 3.14.3 (as specified in `.tool-versions` using asdf vm) - **Node.js**: 24.4.1 (as specified in `.tool-versions` using asdf vm) - **Required Files**: The following files must be present in the same directory: - `kokoro-v1.0.onnx`: The ONNX model file. - `voices-v1.0.bin`: The voice weights file. - `narrate.txt`: The text file you want to narrate. ## Models This project requires the Kokoro v1.0 ONNX model and the corresponding voice binary. You can download them using the links below: - **Kokoro v1.0 ONNX (FP16)**: [kokoro-v1.0.fp16.onnx (169 MB)](https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx) - **Voice Weights**: [voices-v1.0.bin (26.9 MB)](https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin) ### Quick Download ```bash wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin ``` *Note: The script `narrate.py` expects the model file to be named `kokoro-v1.0.onnx` or `kokoro-v1.0.fp16.onnx` in the same directory.* ## Setup ### 1. Create the Virtual Environment The project uses a virtual environment named `kokoro-venv` to manage its dependencies. ```bash # Create the virtual environment using Python 3.14 (as per .tool-versions) python3.14 -m venv kokoro-venv # Activate the environment source kokoro-venv/bin/activate ``` ### 2. Install Dependencies With the virtual environment activated, install the required Python packages: ```bash pip install --upgrade pip pip install -r requirements.txt ``` *Note: For AMD GPU support (MIGraphX), ensure your environment has the necessary ROCm/MIGraphX libraries installed. The script will automatically fall back to the CPU if the GPU provider is unavailable.* ### 3. Audio Requirements On Linux, you may need to install the PortAudio development headers for `sounddevice` to work: ```bash # For Ubuntu/Debian sudo apt-get install libportaudio2 ``` 2. **Model Files**: Ensure you have downloaded the Kokoro ONNX model and voice binaries and placed them in the same folder. ## Usage ### Direct Python Execution You can run the narration script directly: ```bash python narrate.py ``` ### Using the Shell Script A convenience script is provided to run the narrator using the local virtual environment: ```bash ./narrate.sh ``` ## Configuration The script `narrate.py` contains several adjustable settings: - **Voice**: Defaulted to `af_sky`. - **Speed**: Set to `1.3x` for faster narration. - **Environment Variables**: - `HSA_OVERRIDE_GFX_VERSION`: Set to `10.3.0` for compatibility. - `MIGRAPHX_ENABLE_CACHE`: Enabled to speed up subsequent loads. ## File Structure - `narrate.py`: The core logic for TTS generation and audio playback. - `narrate.sh`: Entry point script. - `.tool-versions`: Version pinning for runtime environments. - `kokoro-venv/`: Local Python virtual environment containing dependencies.