Skip to content

[cccl.c]: Add infrastructure to support ahead-of-time compilation (AoT) #8410

@shwina

Description

@shwina

Background

Currently, each algorithm has a cccl_device_<algo>_build function that compiles a kernel via NVRTC and returns a build result struct holding the loaded CUlibrary and related state.

To support ahead-of-time (AoT) compilation — pre-compiling kernels and saving them to disk for use in a different process or on a different machine — the Python layer needs to serialize and deserialize these build result structs. This requires the C layer to expose enough metadata in the structs to make that possible.

Required changes

  1. New fields in all *_build_result_t structs:
    - int cc — the compute capability the kernel was compiled for (encoded as major10 + minor)
    - size_t runtime_policy_size — size of the opaque runtime_policy blob, so it can be round-tripped through serialization
    - Per-kernel char
    lowered name fields — the mangled CUDA kernel names produced by NVRTC, needed to resolve kernels from a cubin via cuLibraryGetKernel during deserialization
  2. Cross-CC build support — when a kernel is compiled for a target CC that doesn't match the current device (e.g. compiling for SM 9.0 on an SM 8.6 machine), cuLibraryLoadData returns
    CUDA_ERROR_NO_BINARY_FOR_GPU. Currently this is a fatal error. The build functions should be updated to treat this case as success — returning the cubin and lowered names without a loaded CUlibrary — so that the
    result can be serialized and shipped to a matching device.

Motivation

These changes are purely additive to the C structs and transparent to existing callers. They unblock the Python layer to implement save() / load_algorithm() for pre-compiled kernel distribution (e.g. shipping pre-compiled kernels in a Python wheel that works across a range of GPU architectures).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In Review

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions