initial commit
This commit is contained in:
103
README.md
Normal file
103
README.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Narrate
|
||||
|
||||
A high-performance text-to-speech (TTS) narration tool using the Kokoro v1.0 ONNX model. This project is optimized for AMD GPUs using the `MIGraphX` execution provider but falls back to CPU when necessary.
|
||||
|
||||
## Features
|
||||
|
||||
- **High Quality**: Leverages the Kokoro v1.0 TTS model for natural-sounding speech.
|
||||
- **Hardware Accelerated**: Optimized for AMD GPUs via ONNX Runtime and MIGraphX.
|
||||
- **Sentence-Level Buffering**: Splits text into logical units (sentences/lines) to provide smooth, continuous playback without long initial wait times.
|
||||
- **Asynchronous Playback**: Uses a dedicated background thread for audio playback to ensure generation and playback happen in parallel.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Hardware
|
||||
- Optimized for systems with AMD GPUs (uses `MIGraphXExecutionProvider`).
|
||||
- Works on CPU (fallback).
|
||||
|
||||
### Software
|
||||
- **Python**: 3.14.3 (as specified in `.tool-versions` using asdf vm)
|
||||
- **Node.js**: 24.4.1 (as specified in `.tool-versions` using asdf vm)
|
||||
- **Required Files**: The following files must be present in the same directory:
|
||||
- `kokoro-v1.0.onnx`: The ONNX model file.
|
||||
- `voices-v1.0.bin`: The voice weights file.
|
||||
- `narrate.txt`: The text file you want to narrate.
|
||||
|
||||
## Models
|
||||
|
||||
This project requires the Kokoro v1.0 ONNX model and the corresponding voice binary. You can download them using the links below:
|
||||
|
||||
- **Kokoro v1.0 ONNX (FP16)**: [kokoro-v1.0.fp16.onnx (169 MB)](https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx)
|
||||
- **Voice Weights**: [voices-v1.0.bin (26.9 MB)](https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin)
|
||||
|
||||
### Quick Download
|
||||
```bash
|
||||
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.fp16.onnx
|
||||
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
|
||||
```
|
||||
|
||||
*Note: The script `narrate.py` expects the model file to be named `kokoro-v1.0.onnx` or `kokoro-v1.0.fp16.onnx` in the same directory.*
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Create the Virtual Environment
|
||||
The project uses a virtual environment named `kokoro-venv` to manage its dependencies.
|
||||
|
||||
```bash
|
||||
# Create the virtual environment using Python 3.14 (as per .tool-versions)
|
||||
python3.14 -m venv kokoro-venv
|
||||
|
||||
# Activate the environment
|
||||
source kokoro-venv/bin/activate
|
||||
```
|
||||
|
||||
### 2. Install Dependencies
|
||||
With the virtual environment activated, install the required Python packages:
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
*Note: For AMD GPU support (MIGraphX), ensure your environment has the necessary ROCm/MIGraphX libraries installed. The script will automatically fall back to the CPU if the GPU provider is unavailable.*
|
||||
|
||||
### 3. Audio Requirements
|
||||
On Linux, you may need to install the PortAudio development headers for `sounddevice` to work:
|
||||
|
||||
```bash
|
||||
# For Ubuntu/Debian
|
||||
sudo apt-get install libportaudio2
|
||||
```
|
||||
|
||||
2. **Model Files**: Ensure you have downloaded the Kokoro ONNX model and voice binaries and placed them in the same folder.
|
||||
|
||||
## Usage
|
||||
|
||||
### Direct Python Execution
|
||||
You can run the narration script directly:
|
||||
```bash
|
||||
python narrate.py
|
||||
```
|
||||
|
||||
### Using the Shell Script
|
||||
A convenience script is provided to run the narrator using the local virtual environment:
|
||||
```bash
|
||||
./narrate.sh
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The script `narrate.py` contains several adjustable settings:
|
||||
|
||||
- **Voice**: Defaulted to `af_sky`.
|
||||
- **Speed**: Set to `1.3x` for faster narration.
|
||||
- **Environment Variables**:
|
||||
- `HSA_OVERRIDE_GFX_VERSION`: Set to `10.3.0` for compatibility.
|
||||
- `MIGRAPHX_ENABLE_CACHE`: Enabled to speed up subsequent loads.
|
||||
|
||||
## File Structure
|
||||
|
||||
- `narrate.py`: The core logic for TTS generation and audio playback.
|
||||
- `narrate.sh`: Entry point script.
|
||||
- `.tool-versions`: Version pinning for runtime environments.
|
||||
- `kokoro-venv/`: Local Python virtual environment containing dependencies.
|
||||
Reference in New Issue
Block a user