Skip to content

Kokoro usage improvements#4357

Open
michalkulakowski wants to merge 1 commit into
mainfrom
mkulakow/kokoro_improvements
Open

Kokoro usage improvements#4357
michalkulakowski wants to merge 1 commit into
mainfrom
mkulakow/kokoro_improvements

Conversation

@michalkulakowski

Copy link
Copy Markdown
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings July 3, 2026 13:56
@michalkulakowski michalkulakowski force-pushed the mkulakow/kokoro_improvements branch from 666f708 to 80f3156 Compare July 3, 2026 13:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Kokoro text-to-speech (TTS) integration by shifting voice embedding discovery to runtime (from the model directory) and moving espeak-ng from an OVMS binary dependency to a separately built artifact in Docker builds.

Changes:

  • Load Kokoro voice embeddings from <models_path>/voices/*.bin when the graph doesn’t explicitly specify voices.
  • Remove Bazel --//:espeak=on/off flag plumbing and build espeak-ng as an optional standalone Docker step (ARG ESPEAK=1/0).
  • Adjust tests and export tooling to align with the new Kokoro usage expectations.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
windows_build.bat Removes Bazel espeak flag wiring from Windows build invocation.
third_party/BUILD Makes espeak-ng aliases unconditional (no build-flag select).
src/test/graph_export_test.cpp Removes Kokoro voices list expectations and voices/ directory precreation.
src/test/audio/text2speech_test.cpp Adds config validation test for missing voices in graph.
src/graph_export/graph_export.cpp Stops enumerating Kokoro voices/*.bin into generated graph templates.
src/BUILD Removes espeak-ng deps from the OVMS binary build.
src/audio/text_to_speech/t2s_servable.cpp Implements fallback loading of embeddings from <models_path>/voices when graph voices are omitted.
src/audio/text_to_speech/t2s_calculator.cc Minor comment cleanup in voice-selection error handling.
Makefile Removes Bazel espeak flag usage; passes ESPEAK as a Docker build arg.
Dockerfile.ubuntu Adds optional standalone Bazel build step for espeak-ng targets controlled by ARG ESPEAK.
Dockerfile.redhat Same as Ubuntu Dockerfile: optional standalone espeak-ng build step.
distro.bzl Removes the Bazel espeak build flag definition/config settings.
demos/common/export_models/export_model.py Changes TTS exporter behavior, including default --model_type.
demos/audio/README.md Adds documentation for ASR leaderboard-based transcription evaluation.
common_settings.bzl Stops loading/creating the removed espeak flag config settings.

Comment on lines 19 to +22
#include <fstream>
#include <sstream>
#include <limits>
#include <vector>
Comment on lines +45 to +62
static std::vector<std::filesystem::path> getVoiceEmbeddingPaths(const std::filesystem::path& voicesDir) {
std::vector<std::filesystem::path> voicePaths;
std::error_code ec;
for (const auto& entry : std::filesystem::directory_iterator(voicesDir, ec)) {
if (ec) {
throw std::runtime_error("Failed to iterate voices directory: " + voicesDir.string());
}
if (!entry.is_regular_file(ec) || ec) {
ec.clear();
continue;
}
if (entry.path().extension() == ".bin") {
voicePaths.emplace_back(entry.path());
}
}
std::sort(voicePaths.begin(), voicePaths.end());
return voicePaths;
}
add_common_arguments(parser_text2speech)
parser_text2speech.add_argument('--num_streams', default=0, type=int, help='The number of parallel execution streams to use for the models in the pipeline.', dest='num_streams')
parser_text2speech.add_argument('--model_type', default='speecht5', choices=['speecht5', 'kokoro'], help='Type of the source TTS model. speecht5 uses optimum-cli; kokoro uses a dedicated PyTorch->OpenVINO conversion path.', dest='model_type')
parser_text2speech.add_argument('--model_type', default='kokoro', choices=['speecht5', 'kokoro'], help='Type of the source TTS model. speecht5 uses optimum-cli; kokoro uses a dedicated PyTorch->OpenVINO conversion path.', dest='model_type')
Comment on lines +252 to +257
[type.googleapis.com / mediapipe.T2sCalculatorOptions]: {
models_path: "/ovms/models_audio/Kokoro-82M"
plugin_config: '{"NUM_STREAMS": "1" }',
target_device: "CPU"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants