A demonstration of using NVIDIA cuTile for 2D FFT convolution in an optical imaging application. This project compares cuTile FFT with CuPy FFT and validates correctness against NumPy.
This demo applies FFT-based convolution to simulate optical imaging:
- Input: Binary mask pattern (2048×2048)
- Kernels: 24 optical point-spread functions (35×35 each)
- Output: Aerial image showing light intensity after optical system
The Hopkins/SOCS model is used: Aerial = Σᵢ wᵢ |Mask ⊛ Kernelᵢ|²
- Python 3.10+
- NVIDIA GPU with CUDA 13.3
- cuTile (
cuda.tile) - PyTorch (2.6+ recommended)
- CuPy
- NumPy
- Matplotlib
Install dependencies:
pip install torch cupy-cuda12x numpy matplotlibFor cuTile (cuda.tile), follow the NVIDIA cuTile installation instructions.
git clone <repo-url>
cd projectThe optical kernels and sample masks come from the OpenILT project:
git clone https://github.com/OpenOPC/OpenILT.git openilt_data
cd openilt_data
git checkout dabb97c6ca3dfd159362e48273c436444c77353b
cd ..Note: This demo was tested with OpenILT commit
dabb97c. If the repository structure changes, use this specific commit.
This provides:
openilt_data/kernel/kernels/focus.pt— 24 optical kernelsopenilt_data/kernel/scales/focus.pt— kernel weightsopenilt_data/tmp/CurvILT_target1.png— sample mask pattern
python optics.py --compare --save comparison.pngpython optics.py --cupy --save figure-cupy.pngpython optics.py --cutile --save figure-cutile.pngpython optics.py --compare --mock --save comparison.png--num-kernels N— Use N kernels (default: 24)--cutline Y— Set intensity profile cutline position--threshold T— Set print threshold for cutline plot
Validate cuTile FFT against NumPy/CuPy/PyTorch:
python test_fft.pyThis runs:
- 1D FFT tests (8 elements)
- 2D FFT tests (8×8)
- Benchmarks at 1024×1024 and 2048×2048
project/
├── FFT.py # cuTile FFT kernel (from NVIDIA samples)
├── test_fft.py # FFT correctness and benchmark tests
├── optics.py # Optical imaging simulation
├── display.py # Visualization utilities
├── openilt_data/ # Data directory (fetch separately)
└── README.md
| Size | CuPy (cuFFT) | cuTile |
|---|---|---|
| 1024×1024 | 0.05 ms | 92 ms |
| 2048×2048 | 0.07 ms | 324 ms |
cuTile FFT produces correct results but is not yet performance-competitive with cuFFT. This demo focuses on correctness and integration.
MIT License — see LICENSE
- OpenILT — CUHK-EDA, optical kernels and benchmark data
- NVIDIA cuTile — Tile-based GPU programming