C++23 Ukrainian text normalization and tokenization utilities with optional Python 3.10+ bindings.
cmake -S . -B build
cmake --build build
ctest --test-dir buildEnable Python bindings explicitly when building with CMake:
cmake -S . -B build-python -DNORMALIZE_UK_CPP_BUILD_PYTHON=ON
cmake --build build-pythonpython -m pip install .import normalize_uk as nuk
print(nuk.number_to_words(123))
print(nuk.normalize_ukrainian("01.05.2024"))
print(nuk.normalize_ukrainian_with_preset("01.05.2024", nuk.NormalizePreset.TtsFriendly))
print([sentence.text for sentence in nuk.split_sentences("П'ять зв'язків. Два.")])
print([token.text for token in nuk.tokenize("П'ять зв'язків.")])More examples live in examples/python/.
The project includes a .clang-format file and a CMake formatting target. Install clang-format, then run:
cmake --build build --target format