google-ml-infra · ybaturina · Jan 9, 2026 · Jan 2, 2026 · Jan 5, 2026 · Jan 6, 2026
diff --git a/gpu/README.md b/gpu/README.md
@@ -10,13 +10,13 @@ versions.
 There are three types of hermetic toolkits configurations:
 
 1) Recommended: [Repository rules use redistributions loaded from NVIDIA repositories](#supported-hermetic-cuda-cudnn-nvshmem-versions).
-   
+
    For full CUDA toolkit hermeticity, use CUDA User Mode Driver libraries loaded from NVIDIA repositories
    by setting `--@cuda_driver//:include_cuda_umd_libs=true` (see [instructions](#configure-hermetic-cuda-user-mode-driver)).
-   
+
 
 2) [Repository rules use redistributions loaded from custom remote locations or
-local files](#2-custom-cudacudnnnvshmem-archives-and-nccl-wheels).
+   local files](#2-custom-cudacudnnnvshmem-archives-and-nccl-wheels).
 
    This option is recommended for testing custom/unreleases redistributions, or
    redistributions previously loaded locally.
@@ -141,12 +141,12 @@ is specified in [third_party/gpus/cuda/hermetic/cuda_redist_versions.bzl](https:
    build:cuda --repo_env TF_NEED_CUDA=1
    build:cuda --@rules_ml_toolchain//common:enable_cuda
    ```
-   
+
    To use Clang compiler for CUDA targets, set
    `--@local_config_cuda//:cuda_compiler=clang`, for NVCC compiler set
-  `--@local_config_cuda//:cuda_compiler=nvcc` and `TF_NVCC_CLANG` environment
+   `--@local_config_cuda//:cuda_compiler=nvcc` and `TF_NVCC_CLANG` environment
    variable.
-   
+
    ```
    build:build_cuda_with_clang --@local_config_cuda//:cuda_compiler=clang
 
@@ -222,12 +222,12 @@ UMD version should be compatible with KMD and CUDA Runtime versions.
 
 
 - Supported Kernel Mode Driver and User Mode Driver version combinations:
- 
+
   Driver versions combination | Is supported
-  -------- | --------
+    -------- | --------
   KMD > UMD | -
   KMD <= UMD | +
- 
+
 - UMD and CUDA Runtime versions compatibility is described in
   [NVIDIA documentation](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id6).
 
@@ -258,11 +258,11 @@ UMD version should be compatible with KMD and CUDA Runtime versions.
    ```
 
 2. To select specific version of hermetic NCCL, set the
-   `HERMETIC_NCCL_VERFSION` environment variable. Use only supported versions.
+   `HERMETIC_NCCL_VERSION` environment variable. Use only supported versions.
    You may set the environment
    variables directly in your shell or in `.bazelrc` file as shown below:
    ```
-   build:cuda --repo_env=HERMETIC_NCCL_VERFSION="2.27.7"
+   build:cuda --repo_env=HERMETIC_NCCL_VERSION="2.27.7"
    ```   
 
 3. To select specific version of hermetic NVSHMEM, set the
@@ -329,19 +329,23 @@ The JSON files contain paths to individual redistributions for different OS
 architectures.
 
 1. Create `cuda_redist.json` and/or `cudnn_redist.json` and/or
-`nvshmem_redist.json` files.
+   `nvshmem_redist.json` files.
 
    `cuda_redist.json` show follow the format below:
 
    ```
    {
       "cuda_cccl": {
-         "linux-x86_64": {
-            "relative_path": "cuda_cccl-linux-x86_64-12.4.99-archive.tar.xz",
-         },
-         "linux-sbsa": {
-            "relative_path": "cuda_cccl-linux-sbsa-12.4.99-archive.tar.xz",
-         }
+          "linux-x86_64": {
+              "full_path": "https://github.com/NVIDIA/cccl/archive/0d328e06c9fc78a216ec70df4917f7230a9c77e3.tar.gz",
+              "sha256": "c45dddfcebfc2d719e0c4cc6a874a4b50a751b90daba139699d3fc11708cf0ef",
+              "strip_prefix": "cccl-0d328e06c9fc78a216ec70df4917f7230a9c77e3",
+        },
+        "linux-sbsa": {
+              "full_path": "https://github.com/NVIDIA/cccl/archive/0d328e06c9fc78a216ec70df4917f7230a9c77e3.tar.gz",
+              "sha256": "c45dddfcebfc2d719e0c4cc6a874a4b50a751b90daba139699d3fc11708cf0ef",
+              "strip_prefix": "cccl-0d328e06c9fc78a216ec70df4917f7230a9c77e3",
+        },
       },
    }
    ```
@@ -384,8 +388,10 @@ architectures.
    }
    ```
 
-   The `relative_path` field can be replaced with `full_path` for the full URLs
-   and absolute local paths starting with `file:///`.
+   Note that `sha_256` and `strip_prefix` are optional.
+
+   `full_path` should be used for the full URLs  and absolute local paths
+   starting with `file:///`.
 
 2. In the downstream project dependent on `rules_ml_toolchain`, update the
    hermetic cuda JSON repository call in `WORKSPACE` file. Both web links and
@@ -449,12 +455,16 @@ dependencies in Google ML projects.
    ```
    _CUSTOM_CUDA_REDISTRIBUTIONS = {
       "cuda_cccl": {
-         "linux-x86_64": {
-            "relative_path": "cuda_cccl-linux-x86_64-12.4.99-archive.tar.xz",
-         },
-         "linux-sbsa": {
-            "relative_path": "cuda_cccl-linux-sbsa-12.4.99-archive.tar.xz",
-         }
+          "linux-x86_64": {
+              "full_path": "https://github.com/NVIDIA/cccl/archive/0d328e06c9fc78a216ec70df4917f7230a9c77e3.tar.gz",
+              "sha256": "c45dddfcebfc2d719e0c4cc6a874a4b50a751b90daba139699d3fc11708cf0ef",
+              "strip_prefix": "cccl-0d328e06c9fc78a216ec70df4917f7230a9c77e3",
+          },
+          "linux-sbsa": {
+              "full_path": "https://github.com/NVIDIA/cccl/archive/0d328e06c9fc78a216ec70df4917f7230a9c77e3.tar.gz",
+              "sha256": "c45dddfcebfc2d719e0c4cc6a874a4b50a751b90daba139699d3fc11708cf0ef",
+              "strip_prefix": "cccl-0d328e06c9fc78a216ec70df4917f7230a9c77e3",
+          },
       },
    }
    ```
@@ -497,14 +507,27 @@ dependencies in Google ML projects.
    }
    ```
 
-   The `relative_path` field can be replaced with `full_path` for the full URLs
-   and absolute local paths starting with `file:///`.
+   Note that `sha_256` and `strip_prefix` are optional.
+
+   `full_path` should be used for the full URLs  and absolute local paths
+   starting with `file:///`.
 
 2. In the same `WORKSPACE` file, pass the created dictionaries to the repository
-   rule. If the dictionaries contain relative paths to distributions, the path
+   rule.
+
+   If the dictionaries contain relative paths to distributions, the path
    prefix should be updated in `cuda_redist_init_repositories()`,
    `cudnn_redist_init_repository()` and `nvshmem_redist_init_repository()`
    calls.
+
+   There is an option to customize BUILD templates when the custom
+   redistributions have different folder structure than default ones.
+   Note that `source_dirs` is mandatory, it's used for the scenarios described
+   [here](https://github.com/google-ml-infra/rules_ml_toolchain/blob/main/gpu/README.md#3-local-toolkit-installations-used-as-sources-for-hermetic-repositories).
+
+   If the templates for the scenarios above are different, you need to provide
+   them in `version_to_templates` under `local` key.
+
    ```
    register_toolchains("@rules_ml_toolchain//cc:linux_x86_64_linux_x86_64_cuda")
    register_toolchains("@rules_ml_toolchain//cc:linux_aarch64_linux_aarch64_cuda")
@@ -520,9 +543,30 @@ dependencies in Google ML projects.
       "cuda_redist_init_repositories",
       "cudnn_redist_init_repository",
    )
+
+   _CCCL_BUILD_TEMPLATES = {
+        "cuda_cccl": {
+            "repo_name": "cuda_cccl",
+            "version_to_template": {
+                "13": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl_github.BUILD.tpl",
+                "12": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl_github.BUILD.tpl",
+                "11": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl_github.BUILD.tpl",
+            },
+            "local": {
+                "source_dirs": ["include", "lib"],
+                "version_to_template": {
+                    "13": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl.BUILD.tpl",
+                    "12": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl.BUILD.tpl",
+                    "11": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl.BUILD.tpl",
+                },
+            },
+        },
+   }
+
    cuda_redist_init_repositories(
       cuda_redistributions = _CUSTOM_CUDA_REDISTRIBUTIONS,
       cuda_redist_path_prefix = "file:///home/usr/Downloads/dists/",
+      redist_versions_to_build_templates = _CCCL_BUILD_TEMPLATES,
    )
    cudnn_redist_init_repository(
       cudnn_redistributions = _CUSTOM_CUDNN_REDISTRIBUTIONS,
@@ -590,13 +634,17 @@ _CUDNN_JSON_DICT = {
 
 _CUDA_DIST_DICT = {
    "cuda_cccl": {
-      "linux-x86_64": {
-            "relative_path": "cuda_cccl-linux-x86_64-12.4.99-archive.tar.xz",
-      },
-      "linux-sbsa": {
-            "relative_path": "cuda_cccl-linux-sbsa-12.4.99-archive.tar.xz",
-      },
-   },
+        "linux-x86_64": {
+            "full_path": "https://github.com/NVIDIA/cccl/archive/0d328e06c9fc78a216ec70df4917f7230a9c77e3.tar.gz",
+            "sha256": "c45dddfcebfc2d719e0c4cc6a874a4b50a751b90daba139699d3fc11708cf0ef",
+            "strip_prefix": "cccl-0d328e06c9fc78a216ec70df4917f7230a9c77e3",
+        },
+        "linux-sbsa": {
+            "full_path": "https://github.com/NVIDIA/cccl/archive/0d328e06c9fc78a216ec70df4917f7230a9c77e3.tar.gz",
+            "sha256": "c45dddfcebfc2d719e0c4cc6a874a4b50a751b90daba139699d3fc11708cf0ef",
+            "strip_prefix": "cccl-0d328e06c9fc78a216ec70df4917f7230a9c77e3",
+        },
+    },,
    "libcusolver": {
       "linux-x86_64": {
             "full_path": "file:///usr/Downloads/dists/libcusolver-linux-x86_64-11.6.0.99-archive.tar.xz",
@@ -607,6 +655,25 @@ _CUDA_DIST_DICT = {
    },
 }
 
+_CCCL_BUILD_TEMPLATES = {
+    "cuda_cccl": {
+        "repo_name": "cuda_cccl",
+        "version_to_template": {
+            "13": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl_github.BUILD.tpl",
+            "12": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl_github.BUILD.tpl",
+            "11": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl_github.BUILD.tpl",
+        },
+        "local": {
+            "source_dirs": ["include", "lib"],
+            "version_to_template": {
+                "13": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl.BUILD.tpl",
+                "12": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl.BUILD.tpl",
+                "11": "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_cccl.BUILD.tpl",
+            },
+        },
+    },
+}
+
 _CUDNN_DIST_DICT = {
    "cudnn": {
       "linux-x86_64": {
@@ -655,9 +722,14 @@ load(
    "cuda_redist_init_repositories",
    "cudnn_redist_init_repository",
 )
-cudnn_redist_init_repositories(
+load(
+    "@rules_ml_toolchain//third_party/gpus/cuda/hermetic:cuda_redist_versions.bzl",
+    "REDIST_VERSIONS_TO_BUILD_TEMPLATES",
+)
+cuda_redist_init_repositories(
    cuda_redistributions = CUDA_REDISTRIBUTIONS | _CUDA_DIST_DICT,
    cuda_redist_path_prefix = "file:///usr/Downloads/dists/",
+   redist_versions_to_build_templates = REDIST_VERSIONS_TO_BUILD_TEMPLATES | _CCCL_BUILD_TEMPLATES,
 )
 cudnn_redist_init_repository(
    cudnn_redistributions = CUDNN_REDISTRIBUTIONS | _CUDNN_DIST_DICT,
@@ -748,5 +820,4 @@ The structure of the folders inside NVSHMEM dir should be the following:
     include/
     lib/
     bin/
-```
-
+```
diff --git a/gpu/cuda/cuda_redist_init_repositories.bzl b/gpu/cuda/cuda_redist_init_repositories.bzl
@@ -20,7 +20,6 @@ load(
     "cuda_redist_init_repositories_wrapper",
     "cudnn_redist_init_repository_wrapper",
 )
-
 load(
     "//third_party/gpus/cuda/hermetic:cuda_redist_versions.bzl",
     "CUDA_REDIST_PATH_PREFIX",
@@ -39,14 +38,17 @@ def cudnn_redist_init_repository(
         cudnn_redistributions,
         cudnn_redist_path_prefix,
         mirrored_tar_cudnn_redist_path_prefix,
-        redist_versions_to_build_templates)
+        redist_versions_to_build_templates,
+    )
 
 def cuda_redist_init_repositories(
         cuda_redistributions,
         cuda_redist_path_prefix = CUDA_REDIST_PATH_PREFIX,
         mirrored_tar_cuda_redist_path_prefix = MIRRORED_TAR_CUDA_REDIST_PATH_PREFIX,
         redist_versions_to_build_templates = REDIST_VERSIONS_TO_BUILD_TEMPLATES):
-    cuda_redist_init_repositories_wrapper(cuda_redistributions,
+    cuda_redist_init_repositories_wrapper(
+        cuda_redistributions,
         cuda_redist_path_prefix,
         mirrored_tar_cuda_redist_path_prefix,
-        redist_versions_to_build_templates)
+        redist_versions_to_build_templates,
+    )
diff --git a/gpu/nccl/nccl_redist_init_repository.bzl b/gpu/nccl/nccl_redist_init_repository.bzl
@@ -28,4 +28,7 @@ load(
 def nccl_redist_init_repository(
         cuda_nccl_wheels = CUDA_NCCL_WHEELS,
         redist_versions_to_build_templates = REDIST_VERSIONS_TO_BUILD_TEMPLATES):
-    nccl_redist_init_repository_wrapper(cuda_nccl_wheels, redist_versions_to_build_templates)
+    nccl_redist_init_repository_wrapper(
+        cuda_nccl_wheels,
+        redist_versions_to_build_templates,
+    )
diff --git a/third_party/extensions/cuda_redist_init.bzl b/third_party/extensions/cuda_redist_init.bzl
@@ -33,6 +33,7 @@ def _cuda_redist_init_ext_impl(mctx):
         cudnn_redistributions = CUDNN_REDISTRIBUTIONS,
     )
 
+# TODO(ybaturina): add missing features from workspace mode
 cuda_redist_init_ext = module_extension(
     implementation = _cuda_redist_init_ext_impl,
 )
diff --git a/third_party/gpus/cuda/hermetic/cuda_cccl_github.BUILD.tpl b/third_party/gpus/cuda/hermetic/cuda_cccl_github.BUILD.tpl
@@ -0,0 +1,82 @@
+licenses(["restricted"])  # NVIDIA proprietary license
+
+filegroup(
+    name = "header_list",
+    srcs = [":thrust_header_list",":nv_header_list", ":cuda_header_list", ":cub_header_list"],
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+cc_library(
+    name = "headers",
+    deps = [":thrust_headers",":nv_headers", ":cuda_headers", ":cub_headers"],
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+filegroup(
+    name = "thrust_header_list",
+    srcs = glob([
+        %{comment}"thrust/thrust/**",
+    ]),
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+cc_library(
+    name = "thrust_headers",
+    hdrs = [":thrust_header_list"],
+    include_prefix = "third_party/gpus/cuda/include",
+    includes = ["thrust"],
+    strip_include_prefix = "thrust",
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+filegroup(
+    name = "cuda_header_list",
+    srcs = glob([
+        %{comment}"libcudacxx/include/cuda/**",
+    ]),
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+cc_library(
+    name = "cuda_headers",
+    hdrs = [":cuda_header_list"],
+    include_prefix = "third_party/gpus/cuda/include",
+    includes = ["libcudacxx/include"],
+    strip_include_prefix = "libcudacxx/include",
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+filegroup(
+    name = "nv_header_list",
+    srcs = glob([
+        %{comment}"libcudacxx/include/nv/**",
+    ]),
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+cc_library(
+    name = "nv_headers",
+    hdrs = ["nv_header_list"],
+    include_prefix = "third_party/gpus/cuda/include",
+    includes = ["libcudacxx/include/nv"],
+    strip_include_prefix = "libcudacxx/include",
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+filegroup(
+    name = "cub_header_list",
+    srcs = glob([
+        %{comment}"cub/cub/**",
+    ]),
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+
+cc_library(
+    name = "cub_headers",
+    hdrs = [":cub_header_list"],
+    include_prefix = "third_party/gpus/cuda/include",
+    includes = ["cub"],
+    strip_include_prefix = "cub",
+    visibility = ["@local_config_cuda//cuda:__pkg__"],
+)
+