Batch DICOM compressor and HTJ2K/JPEG 2000 lossless transcoder CLI in C++20 for recursively re-encoding studies to 1.2.840.10008.1.2.4.201, with a native hot path, no subprocesses, no temporary .raw/.yuv/.j2c files, and optional JSON reporting.
- HTJ2K encode always via OpenJPH in memory.
- Main DICOM stack via DCMTK for P10 reading, dataset, metadata, encapsulation and writing.
- Source decode:
- native/JPEG/JPEG-LS/RLE via DCMTK
- JPEG 2000 via OpenJPEG
- HTJ2K via OpenJPH
- Main parallelism per file with dynamic queue and fixed thread pool.
- Safe writing with neighbor temporary file + atomic
rename. --output-rootand--in-placemodes.- ZIP per patient with
minizbackend.
app/
core/
codec/
dicom/
util/
platform/
tests/
bench/
cmake/
scripts/
third_party/miniz/
Implemented in this version:
- Complete CLI.
- Recursive DICOM discovery.
- Single-frame and multi-frame re-encoding.
MONOCHROME1,MONOCHROME2,RGB,YBR_FULL,YBR_FULL_422,PALETTE COLOR,YBR_RCT,YBR_ICT.- Conservative preservation of metadata and lossy history.
ExtendedOffsetTableandExtendedOffsetTableLengthsfor multi-frame output.- Streaming multi-frame encapsulation for large cine/volume studies: encoded codestreams are appended into DCMTK's output pixel sequence as each frame is produced, avoiding a redundant whole-study codestream vector.
- Aggregate and JSON report.
- ZIP per patient.
- Tests + benchmark executable.
Known limitations are in DICOM_NOTES.md.
Required:
- CMake 3.24+
- Ninja
- OpenJPH 0.26.x
- OpenJPEG 2.5.x
- DCMTK 3.7.x
- C++20 compiler
The scripts in scripts/ build an isolated prefix in .deps/install/....
git clone https://github.com/ThalesMMS/dicompressor.git
cd dicompressormacOS Apple Silicon:
./scripts/bootstrap_deps_macos_arm64.shLinux x86_64:
./scripts/bootstrap_deps_linux_x86_64.shmacOS Apple Silicon:
cmake --preset macos-arm64-release \
-DCMAKE_PREFIX_PATH="$PWD/.deps/install/macos-arm64"Linux:
cmake --preset release \
-DCMAKE_PREFIX_PATH="$PWD/.deps/install/linux-x86_64"macOS Apple Silicon:
cmake --build --preset macos-arm64-release -jLinux:
cmake --build --preset release -jmacOS Apple Silicon:
ctest --test-dir build/macos-arm64-release --output-on-failureLinux:
ctest --preset releasereleasedebug-sanitizedmacos-arm64-release
Release uses -O3, NDEBUG and IPO/LTO when supported.
dicompressor <input_root> [--output-root PATH | --in-place]
[--zip-per-patient]
[--zip-mode stored|deflated (requires --zip-per-patient)]
[--report-json PATH]
[--num-decomps N]
[--block-size X,Y]
[--overwrite]
[--regenerate-sop-instance-uid]
[--strict-color]
[--workers N]
[--log-level trace|debug|info|warn|error]--zip-mode selects the compression mode for patient ZIP output and requires --zip-per-patient.
Use ./build/release/dicompressor on Linux or ./build/macos-arm64-release/dicompressor on Apple Silicon.
Mirrored output in separate folder:
Linux:
./build/release/dicompressor ./Studies --output-root ./Studies-outputmacOS Apple Silicon:
./build/macos-arm64-release/dicompressor ./Studies --output-root ./Studies-outputIn-place:
./build/release/dicompressor ./Studies --in-place --workers 8With JSON report:
./build/release/dicompressor ./Studies \
--output-root ./Studies-output \
--report-json ./report.jsonWith ZIP per patient and explicit ZIP mode:
./build/release/dicompressor ./Studies \
--output-root ./Studies-output \
--zip-per-patient \
--zip-mode stored- Always writes output in HTJ2K lossless.
- Re-encodes already compressed files when the source syntax is decodable.
- Preserves
SOPInstanceUIDby default. - With
--regenerate-sop-instance-uid, generates new UID and keeps file meta coherent. - If the dataset already indicates prior loss or the source Transfer Syntax is lossy, maintains
LossyImageCompression = "01". - Unsupported files:
--output-root: are copied and marked ascopied--in-place: original remains intact and enters ascopied
--strict-colorpromotes unsupported color/photometry cases to failure.
The final report aggregates:
totalokcopiedfailedzippedframespixelsbytes_readbytes_written- times per phase
- throughput in
files/s,frames/sandMPix/s
With --report-json, the file also includes entries per job.
The transcode_bench binary reuses the same core and emits the aggregate execution summary. Use ./build/release/transcode_bench on Linux or ./build/macos-arm64-release/transcode_bench on Apple Silicon.
./build/release/transcode_bench ./Studies --output-root ./Studies-bench-out --workers 1
./build/release/transcode_bench ./Studies --output-root ./Studies-bench-out --workers 8To benchmark streaming multi-frame encapsulation, use an input corpus that includes actual multi-frame cine or volume DICOM objects. A directory of only single-frame images measures decode/encode/write throughput, but it does not exercise the large-study frame append path or its memory behavior.
Memory usage for multi-frame output still includes DCMTK's internal compressed pixel sequence representation; the streaming path avoids double-buffering those encoded frames during the encoding phase rather than providing constant-memory output streaming.
The suite covers:
- CLI parsing
- DICOM discovery
- minimal HTJ2K encoder/decoder round-trip
- streaming multi-frame pixel sequence offsets
- multi-frame transcode output validity and offset tables
- JSON report generation
output-rootin-placecopiedfallback- ZIP per patient
- DICOM-Decoder-dev, a pure Swift DICOM decoder toolkit for iOS and macOS.
- Dicom-Tools-dev, a multi-language DICOM toolkit with CLIs and utilities across Python, Rust, C++, C#, Java, and JS.
- MTK-dev, an Apple-platform volumetric rendering stack for medical-image research and prototyping.
- The original Python prototype remains in the repository only as architectural reference for the replaced flow.
- This v1 prioritizes throughput and operational safety over universal coverage of all exotic DICOM formats.
- Performance details: PERFORMANCE_NOTES.md
- DICOM details: DICOM_NOTES.md