Kokoro usage improvements#4357
Open
michalkulakowski wants to merge 1 commit into
Open
Conversation
666f708 to
80f3156
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Kokoro text-to-speech (TTS) integration by shifting voice embedding discovery to runtime (from the model directory) and moving espeak-ng from an OVMS binary dependency to a separately built artifact in Docker builds.
Changes:
- Load Kokoro voice embeddings from
<models_path>/voices/*.binwhen the graph doesn’t explicitly specifyvoices. - Remove Bazel
--//:espeak=on/offflag plumbing and build espeak-ng as an optional standalone Docker step (ARG ESPEAK=1/0). - Adjust tests and export tooling to align with the new Kokoro usage expectations.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| windows_build.bat | Removes Bazel espeak flag wiring from Windows build invocation. |
| third_party/BUILD | Makes espeak-ng aliases unconditional (no build-flag select). |
| src/test/graph_export_test.cpp | Removes Kokoro voices list expectations and voices/ directory precreation. |
| src/test/audio/text2speech_test.cpp | Adds config validation test for missing voices in graph. |
| src/graph_export/graph_export.cpp | Stops enumerating Kokoro voices/*.bin into generated graph templates. |
| src/BUILD | Removes espeak-ng deps from the OVMS binary build. |
| src/audio/text_to_speech/t2s_servable.cpp | Implements fallback loading of embeddings from <models_path>/voices when graph voices are omitted. |
| src/audio/text_to_speech/t2s_calculator.cc | Minor comment cleanup in voice-selection error handling. |
| Makefile | Removes Bazel espeak flag usage; passes ESPEAK as a Docker build arg. |
| Dockerfile.ubuntu | Adds optional standalone Bazel build step for espeak-ng targets controlled by ARG ESPEAK. |
| Dockerfile.redhat | Same as Ubuntu Dockerfile: optional standalone espeak-ng build step. |
| distro.bzl | Removes the Bazel espeak build flag definition/config settings. |
| demos/common/export_models/export_model.py | Changes TTS exporter behavior, including default --model_type. |
| demos/audio/README.md | Adds documentation for ASR leaderboard-based transcription evaluation. |
| common_settings.bzl | Stops loading/creating the removed espeak flag config settings. |
Comment on lines
19
to
+22
| #include <fstream> | ||
| #include <sstream> | ||
| #include <limits> | ||
| #include <vector> |
Comment on lines
+45
to
+62
| static std::vector<std::filesystem::path> getVoiceEmbeddingPaths(const std::filesystem::path& voicesDir) { | ||
| std::vector<std::filesystem::path> voicePaths; | ||
| std::error_code ec; | ||
| for (const auto& entry : std::filesystem::directory_iterator(voicesDir, ec)) { | ||
| if (ec) { | ||
| throw std::runtime_error("Failed to iterate voices directory: " + voicesDir.string()); | ||
| } | ||
| if (!entry.is_regular_file(ec) || ec) { | ||
| ec.clear(); | ||
| continue; | ||
| } | ||
| if (entry.path().extension() == ".bin") { | ||
| voicePaths.emplace_back(entry.path()); | ||
| } | ||
| } | ||
| std::sort(voicePaths.begin(), voicePaths.end()); | ||
| return voicePaths; | ||
| } |
| add_common_arguments(parser_text2speech) | ||
| parser_text2speech.add_argument('--num_streams', default=0, type=int, help='The number of parallel execution streams to use for the models in the pipeline.', dest='num_streams') | ||
| parser_text2speech.add_argument('--model_type', default='speecht5', choices=['speecht5', 'kokoro'], help='Type of the source TTS model. speecht5 uses optimum-cli; kokoro uses a dedicated PyTorch->OpenVINO conversion path.', dest='model_type') | ||
| parser_text2speech.add_argument('--model_type', default='kokoro', choices=['speecht5', 'kokoro'], help='Type of the source TTS model. speecht5 uses optimum-cli; kokoro uses a dedicated PyTorch->OpenVINO conversion path.', dest='model_type') |
Comment on lines
+252
to
+257
| [type.googleapis.com / mediapipe.T2sCalculatorOptions]: { | ||
| models_path: "/ovms/models_audio/Kokoro-82M" | ||
| plugin_config: '{"NUM_STREAMS": "1" }', | ||
| target_device: "CPU" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``