[FEA]: Add pathfinder cudla support (.so, .h) (#1855)

rwgk · web-flow · commit 5ae423f3dace · 2026-04-06T10:02:54.000-07:00
* pathfinder: add cudla and nvcudla support

Add pathfinder support for loading ``libcudla.so.1`` from the ``nvidia-cudla``
package and probing ``libnvcudla.so`` through the existing canary subprocess
path. Use that probe in the cudla load test so hosts without the platform
runtime are skipped, while real ``libcudla.so.1`` load failures still surface
when ``libnvcudla.so`` is available.

Made-with: Cursor

* pathfinder: gate cudla support by machine architecture

Mark cudla and nvcudla as aarch64-only descriptors and derive the supported
library tables from the current machine as well as the current OS. This keeps
those libraries known to pathfinder while reporting them as unavailable on
linux-64, and updates the descriptor-registry tests to match the new
current-platform filtering model.

Made-with: Cursor

* pathfinder: skip nvcudla tests when runtime is absent

Skip the cudla and nvcudla load tests on aarch64 hosts when the nvcudla canary
probe cannot resolve libnvcudla.so. This keeps non-Tegra linux-aarch64 systems
from failing strict test runs while still exercising the real success path on
Tegra platforms where the platform runtime is installed.

Made-with: Cursor

* pathfinder: rely on nvcudla runtime probe in tests

Remove the machine-architecture gating for cudla and nvcudla so they remain
part of the normal Linux descriptor tables. Let the nvcudla canary probe decide
whether cudla and nvcudla tests should run, which keeps strict test runs green
on hosts without the platform runtime while still exercising real load behavior
where libnvcudla.so is available.

Made-with: Cursor

* pathfinder: share libnvcudla test skip helper

Move the libnvcudla.so skip logic into conftest so cudla and nvcudla tests use
one shared rule. Keeping the helper in the pytest support layer avoids duplicate
test code while still deferring the pathfinder import until the helper runs.

Made-with: Cursor

* pathfinder: add cudla header lookup support

Register cudla as a CTK header so locate_nvidia_header_directory() can find
cudla.h in the standard cu13 wheel include directory. In strict header tests,
skip cudla on hosts where libnvcudla.so is not available so Tegra setups still
exercise the real path without making unsupported hosts fail.

Made-with: Cursor

* pathfinder: classify cudla as a CTK library

Move cudla into the CTK descriptor block so its packaging classification matches
how it is shipped in toolkit installs and the optional nvidia-cudla wheel.
This keeps the catalog organization consistent with the current understanding of
cudla as a CUDA Toolkit component rather than a third-party add-on.

Made-with: Cursor

* Undo Copyright date change left over after undoing all other intermediate changes.
diff --git a/cuda_pathfinder/cuda/pathfinder/_dynamic_libs/descriptor_catalog.py b/cuda_pathfinder/cuda/pathfinder/_dynamic_libs/descriptor_catalog.py
@@ -290,6 +290,12 @@ class DescriptorSpec:
         anchor_rel_dirs_windows=("extras/CUPTI/lib64", "bin"),
         ctk_root_canary_anchor_libnames=("cudart",),
     ),
+    DescriptorSpec(
+        name="cudla",
+        packaged_with="ctk",
+        linux_sonames=("libcudla.so.1",),
+        site_packages_linux=("nvidia/cu13/lib",),
+    ),
     # -----------------------------------------------------------------------
     # Third-party / separately packaged libraries
     # -----------------------------------------------------------------------
@@ -386,6 +392,11 @@ class DescriptorSpec:
         linux_sonames=("libcuda.so.1",),
         windows_dlls=("nvcuda.dll",),
     ),
+    DescriptorSpec(
+        name="nvcudla",
+        packaged_with="driver",
+        linux_sonames=("libnvcudla.so",),
+    ),
     DescriptorSpec(
         name="nvml",
         packaged_with="driver",
diff --git a/cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py b/cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py
@@ -99,14 +99,18 @@ def _raise_canary_probe_child_process_error(
 
 
 @functools.cache
-def _resolve_system_loaded_abs_path_in_subprocess(libname: str) -> str | None:
+def _resolve_system_loaded_abs_path_in_subprocess(
+    libname: str,
+    *,
+    timeout: float = _CANARY_PROBE_TIMEOUT_SECONDS,
+) -> str | None:
     """Resolve a canary library's absolute path in a fresh Python subprocess."""
     try:
         result = subprocess.run(  # noqa: S603 - trusted argv: current interpreter + internal probe module
             build_dynamic_lib_subprocess_command(MODE_CANARY, libname),
             capture_output=True,
             text=True,
-            timeout=_CANARY_PROBE_TIMEOUT_SECONDS,
+            timeout=timeout,
             check=False,
             cwd=DYNAMIC_LIB_SUBPROCESS_CWD,
         )
@@ -127,6 +131,11 @@ def _resolve_system_loaded_abs_path_in_subprocess(libname: str) -> str | None:
     return None
 
 
+def _loadable_via_canary_subprocess(libname: str, *, timeout: float = _CANARY_PROBE_TIMEOUT_SECONDS) -> bool:
+    """Return True if the canary subprocess can resolve ``libname`` via system search."""
+    return _resolve_system_loaded_abs_path_in_subprocess(libname, timeout=timeout) is not None
+
+
 def _try_ctk_root_canary(ctx: SearchContext) -> str | None:
     """Try CTK-root canary fallback for descriptor-configured libraries."""
     for canary_libname in ctx.desc.ctk_root_canary_anchor_libnames:
diff --git a/cuda_pathfinder/cuda/pathfinder/_headers/header_descriptor_catalog.py b/cuda_pathfinder/cuda/pathfinder/_headers/header_descriptor_catalog.py
@@ -134,6 +134,13 @@ class HeaderDescriptorSpec:
         site_packages_dirs=("nvidia/cu13/include", "nvidia/cuda_nvcc/nvvm/include"),
         anchor_include_rel_dirs=("nvvm/include",),
     ),
+    HeaderDescriptorSpec(
+        name="cudla",
+        packaged_with="ctk",
+        header_basename="cudla.h",
+        site_packages_dirs=("nvidia/cu13/include",),
+        available_on_windows=False,
+    ),
     # -----------------------------------------------------------------------
     # Third-party / separately packaged headers
     # -----------------------------------------------------------------------
diff --git a/cuda_pathfinder/pyproject.toml b/cuda_pathfinder/pyproject.toml
@@ -36,6 +36,7 @@ cu13 = [
     "cuda-toolkit[cufile]==13.*; sys_platform != 'win32'",
     "cutensor-cu13",
     "nvidia-cublasmp-cu13; sys_platform != 'win32'",
+    "nvidia-cudla; platform_system == 'Linux' and platform_machine == 'aarch64'",
     "nvidia-cudss-cu13",
     "nvidia-cufftmp-cu13; sys_platform != 'win32'",
     "nvidia-cusolvermp-cu13; sys_platform != 'win32'",
diff --git a/cuda_pathfinder/tests/conftest.py b/cuda_pathfinder/tests/conftest.py
@@ -29,3 +29,15 @@ def _append(message):
         request.config.custom_info.append(f"{request.node.name}: {message}")
 
     return _append
+
+
+def skip_if_missing_libnvcudla_so(libname: str, *, timeout: float) -> None:
+    if libname not in ("cudla", "nvcudla"):
+        return
+    # Keep the import inside the helper so unrelated import issues do not fail
+    # pytest collection for the whole test suite.
+    from cuda.pathfinder._dynamic_libs import load_nvidia_dynamic_lib as load_nvidia_dynamic_lib_module
+
+    if load_nvidia_dynamic_lib_module._loadable_via_canary_subprocess("nvcudla", timeout=timeout):
+        return
+    pytest.skip("libnvcudla.so is not loadable via canary subprocess on this host.")
diff --git a/cuda_pathfinder/tests/test_driver_lib_loading.py b/cuda_pathfinder/tests/test_driver_lib_loading.py
@@ -16,6 +16,7 @@
     run_load_nvidia_dynamic_lib_in_subprocess,
 )
 
+from conftest import skip_if_missing_libnvcudla_so
 from cuda.pathfinder._dynamic_libs.lib_descriptor import LIB_DESCRIPTORS
 from cuda.pathfinder._dynamic_libs.load_dl_common import DynamicLibNotFoundError, LoadedDL
 from cuda.pathfinder._dynamic_libs.load_nvidia_dynamic_lib import (
@@ -147,6 +148,7 @@ def raise_child_process_failed():
         error_label="Load subprocess child process",
     )
     if payload.status == STATUS_NOT_FOUND:
+        skip_if_missing_libnvcudla_so(libname, timeout=timeout)
         if STRICTNESS == "all_must_work":
             raise_child_process_failed()
         info_summary_append(f"Not found: {libname=!r}")
diff --git a/cuda_pathfinder/tests/test_find_nvidia_headers.py b/cuda_pathfinder/tests/test_find_nvidia_headers.py
@@ -21,6 +21,7 @@
 import pytest
 
 import cuda.pathfinder._headers.find_nvidia_headers as find_nvidia_headers_module
+from conftest import skip_if_missing_libnvcudla_so
 from cuda.pathfinder import LocatedHeaderDir, find_nvidia_header_directory, locate_nvidia_header_directory
 from cuda.pathfinder._dynamic_libs.load_nvidia_dynamic_lib import (
     _resolve_system_loaded_abs_path_in_subprocess,
@@ -158,6 +159,8 @@ def test_locate_ctk_headers(info_summary_append, libname):
         h_filename = SUPPORTED_HEADERS_CTK[libname]
         assert os.path.isfile(os.path.join(hdr_dir, h_filename))
     if STRICTNESS == "all_must_work":
+        if libname == "cudla":
+            skip_if_missing_libnvcudla_so(libname, timeout=30)
         assert hdr_dir is not None
 
 
diff --git a/cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py b/cuda_pathfinder/tests/test_load_nvidia_dynamic_lib.py
@@ -11,10 +11,14 @@
 )
 from local_helpers import have_distribution
 
+from conftest import skip_if_missing_libnvcudla_so
 from cuda.pathfinder import DynamicLibNotAvailableError, DynamicLibUnknownError, load_nvidia_dynamic_lib
 from cuda.pathfinder._dynamic_libs import load_nvidia_dynamic_lib as load_nvidia_dynamic_lib_module
 from cuda.pathfinder._dynamic_libs import supported_nvidia_libs
-from cuda.pathfinder._dynamic_libs.subprocess_protocol import STATUS_NOT_FOUND, parse_dynamic_lib_subprocess_payload
+from cuda.pathfinder._dynamic_libs.subprocess_protocol import (
+    STATUS_NOT_FOUND,
+    parse_dynamic_lib_subprocess_payload,
+)
 from cuda.pathfinder._utils.platform_aware import IS_WINDOWS, quote_for_shell
 
 STRICTNESS = os.environ.get("CUDA_PATHFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS", "see_what_works")
@@ -117,6 +121,7 @@ def raise_child_process_failed():
         raise RuntimeError(build_child_process_failed_for_libname_message(libname, result))
 
     if result.returncode != 0:
+        skip_if_missing_libnvcudla_so(libname, timeout=timeout)
         raise_child_process_failed()
     assert not result.stderr
     payload = parse_dynamic_lib_subprocess_payload(
@@ -125,6 +130,7 @@ def raise_child_process_failed():
         error_label="Load subprocess child process",
     )
     if payload.status == STATUS_NOT_FOUND:
+        skip_if_missing_libnvcudla_so(libname, timeout=timeout)
         if STRICTNESS == "all_must_work" and not _is_expected_load_nvidia_dynamic_lib_failure(libname):
             raise_child_process_failed()
         info_summary_append(f"Not found: {libname=!r}")

Original file line number	Diff line number	Diff line change
`@@ -16,6 +16,7 @@`
`16`	`16`	`run_load_nvidia_dynamic_lib_in_subprocess,`
`17`	`17`	`)`
`18`	`18`
	`19`	`+from conftest import skip_if_missing_libnvcudla_so`
`19`	`20`	`from cuda.pathfinder._dynamic_libs.lib_descriptor import LIB_DESCRIPTORS`
`20`	`21`	`from cuda.pathfinder._dynamic_libs.load_dl_common import DynamicLibNotFoundError, LoadedDL`
`21`	`22`	`from cuda.pathfinder._dynamic_libs.load_nvidia_dynamic_lib import (`
`@@ -147,6 +148,7 @@ def raise_child_process_failed():`
`147`	`148`	`error_label="Load subprocess child process",`
`148`	`149`	`)`
`149`	`150`	`if payload.status == STATUS_NOT_FOUND:`
	`151`	`+ skip_if_missing_libnvcudla_so(libname, timeout=timeout)`
`150`	`152`	`if STRICTNESS == "all_must_work":`
`151`	`153`	`raise_child_process_failed()`
`152`	`154`	`info_summary_append(f"Not found: {libname=!r}")`