Skip to content

ThalesMMS/dicompressor

Repository files navigation

dicompressor

C++20 CMake 3.24+ License: MIT

Batch DICOM compressor and HTJ2K/JPEG 2000 lossless transcoder CLI in C++20 for recursively re-encoding studies to 1.2.840.10008.1.2.4.201, with a native hot path, no subprocesses, no temporary .raw/.yuv/.j2c files, and optional JSON reporting.

Overview

  • HTJ2K encode always via OpenJPH in memory.
  • Main DICOM stack via DCMTK for P10 reading, dataset, metadata, encapsulation and writing.
  • Source decode:
    • native/JPEG/JPEG-LS/RLE via DCMTK
    • JPEG 2000 via OpenJPEG
    • HTJ2K via OpenJPH
  • Main parallelism per file with dynamic queue and fixed thread pool.
  • Safe writing with neighbor temporary file + atomic rename.
  • --output-root and --in-place modes.
  • ZIP per patient with miniz backend.

Structure

app/
core/
codec/
dicom/
util/
platform/
tests/
bench/
cmake/
scripts/
third_party/miniz/

Current state

Implemented in this version:

  • Complete CLI.
  • Recursive DICOM discovery.
  • Single-frame and multi-frame re-encoding.
  • MONOCHROME1, MONOCHROME2, RGB, YBR_FULL, YBR_FULL_422, PALETTE COLOR, YBR_RCT, YBR_ICT.
  • Conservative preservation of metadata and lossy history.
  • ExtendedOffsetTable and ExtendedOffsetTableLengths for multi-frame output.
  • Streaming multi-frame encapsulation for large cine/volume studies: encoded codestreams are appended into DCMTK's output pixel sequence as each frame is produced, avoiding a redundant whole-study codestream vector.
  • Aggregate and JSON report.
  • ZIP per patient.
  • Tests + benchmark executable.

Known limitations are in DICOM_NOTES.md.

Dependencies

Required:

  • CMake 3.24+
  • Ninja
  • OpenJPH 0.26.x
  • OpenJPEG 2.5.x
  • DCMTK 3.7.x
  • C++20 compiler

The scripts in scripts/ build an isolated prefix in .deps/install/....

Build

0. Clone the repository

git clone https://github.com/ThalesMMS/dicompressor.git
cd dicompressor

1. Bootstrap dependencies

macOS Apple Silicon:

./scripts/bootstrap_deps_macos_arm64.sh

Linux x86_64:

./scripts/bootstrap_deps_linux_x86_64.sh

2. Configure

macOS Apple Silicon:

cmake --preset macos-arm64-release \
  -DCMAKE_PREFIX_PATH="$PWD/.deps/install/macos-arm64"

Linux:

cmake --preset release \
  -DCMAKE_PREFIX_PATH="$PWD/.deps/install/linux-x86_64"

3. Compile

macOS Apple Silicon:

cmake --build --preset macos-arm64-release -j

Linux:

cmake --build --preset release -j

4. Test

macOS Apple Silicon:

ctest --test-dir build/macos-arm64-release --output-on-failure

Linux:

ctest --preset release

Presets

  • release
  • debug-sanitized
  • macos-arm64-release

Release uses -O3, NDEBUG and IPO/LTO when supported.

Usage

dicompressor <input_root> [--output-root PATH | --in-place]
                           [--zip-per-patient]
                           [--zip-mode stored|deflated (requires --zip-per-patient)]
                           [--report-json PATH]
                           [--num-decomps N]
                           [--block-size X,Y]
                           [--overwrite]
                           [--regenerate-sop-instance-uid]
                           [--strict-color]
                           [--workers N]
                           [--log-level trace|debug|info|warn|error]

--zip-mode selects the compression mode for patient ZIP output and requires --zip-per-patient.

Examples

Use ./build/release/dicompressor on Linux or ./build/macos-arm64-release/dicompressor on Apple Silicon.

Mirrored output in separate folder:

Linux:

./build/release/dicompressor ./Studies --output-root ./Studies-output

macOS Apple Silicon:

./build/macos-arm64-release/dicompressor ./Studies --output-root ./Studies-output

In-place:

./build/release/dicompressor ./Studies --in-place --workers 8

With JSON report:

./build/release/dicompressor ./Studies \
  --output-root ./Studies-output \
  --report-json ./report.json

With ZIP per patient and explicit ZIP mode:

./build/release/dicompressor ./Studies \
  --output-root ./Studies-output \
  --zip-per-patient \
  --zip-mode stored

Functional behavior

  • Always writes output in HTJ2K lossless.
  • Re-encodes already compressed files when the source syntax is decodable.
  • Preserves SOPInstanceUID by default.
  • With --regenerate-sop-instance-uid, generates new UID and keeps file meta coherent.
  • If the dataset already indicates prior loss or the source Transfer Syntax is lossy, maintains LossyImageCompression = "01".
  • Unsupported files:
    • --output-root: are copied and marked as copied
    • --in-place: original remains intact and enters as copied
  • --strict-color promotes unsupported color/photometry cases to failure.

Report

The final report aggregates:

  • total
  • ok
  • copied
  • failed
  • zipped
  • frames
  • pixels
  • bytes_read
  • bytes_written
  • times per phase
  • throughput in files/s, frames/s and MPix/s

With --report-json, the file also includes entries per job.

Benchmark

The transcode_bench binary reuses the same core and emits the aggregate execution summary. Use ./build/release/transcode_bench on Linux or ./build/macos-arm64-release/transcode_bench on Apple Silicon.

./build/release/transcode_bench ./Studies --output-root ./Studies-bench-out --workers 1
./build/release/transcode_bench ./Studies --output-root ./Studies-bench-out --workers 8

To benchmark streaming multi-frame encapsulation, use an input corpus that includes actual multi-frame cine or volume DICOM objects. A directory of only single-frame images measures decode/encode/write throughput, but it does not exercise the large-study frame append path or its memory behavior.

Memory usage for multi-frame output still includes DCMTK's internal compressed pixel sequence representation; the streaming path avoids double-buffering those encoded frames during the encoding phase rather than providing constant-memory output streaming.

Tests

The suite covers:

  • CLI parsing
  • DICOM discovery
  • minimal HTJ2K encoder/decoder round-trip
  • streaming multi-frame pixel sequence offsets
  • multi-frame transcode output validity and offset tables
  • JSON report generation
  • output-root
  • in-place
  • copied fallback
  • ZIP per patient

Related ThalesMMS projects

  • DICOM-Decoder-dev, a pure Swift DICOM decoder toolkit for iOS and macOS.
  • Dicom-Tools-dev, a multi-language DICOM toolkit with CLIs and utilities across Python, Rust, C++, C#, Java, and JS.
  • MTK-dev, an Apple-platform volumetric rendering stack for medical-image research and prototyping.

Notes

  • The original Python prototype remains in the repository only as architectural reference for the replaced flow.
  • This v1 prioritizes throughput and operational safety over universal coverage of all exotic DICOM formats.
  • Performance details: PERFORMANCE_NOTES.md
  • DICOM details: DICOM_NOTES.md

About

DICOM HTJ2K transcoder for medical imaging.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors