Skip to content

Add a layer templating mechanism #328

@ncoghlan

Description

@ncoghlan

(This is a potential alternative approach to resolving #179 that can be handled via relatively simple dict manipulation when reading the stack specifications, rather than the complex additions that would be needed for the matrix processing idea. It's based on a file templating based workaround initially created by @neilmehta24)

#179 proposes to allow parallel CUDA stacks to be defined as a single stack with multiple build environments. This ends up being awkward, as it requires coming up with a novel way to specify the aspects that should differ between the stacks within the limitations of TOML 1.0 syntactic compatibility.

This proposal is to instead allow the common parts of a layer definition to be extracted out to a layer template, and then reference those templates from the parallel stacks. The relationships between the parallel stacks would be spelled out in the layer graph as normal, without any need to introduce a one-to-many mapping between layer specifications and environment builds.

For example (using the existing torch example in the repo as a starting point):

# Demonstrate using priority indexes to define cross-platform parallel torch stacks
[[runtimes]]
name = "cpython3.11"
python_implementation = "cpython@3.11.13"
requirements = [
    # Share a common numpy across the different torch variants
    "numpy",
]

[[templates]]
name = "torch-2.8"
runtime = "cpython3.11"
requirements = [
    "torch==2.8.0",
    # Skip listing numpy, so numpy updates don't automatically invalidate the layer lock
]
dynlib_exclude = [
    "triton/**"
]

[[frameworks]]
name = "torch-cpu"
template = "torch-2.8"
package_indexes = { torch = "pytorch-cpu" }
# priority_indexes = ["pytorch-cpu"]

[[frameworks]]
name = "torch-cu128"
template = "torch-2.8"
package_indexes = { torch = "pytorch-cu128" }
# priority_indexes = ["pytorch-cu128"]

[[templates]]
name = "torch-app"
launch_module = "report_torch_cuda_version.py"
requirements = [
    # Exact version pin is inherited from the framework layer
    "torch",
]

[[applications]]
name = "cpu"
template = "torch-app"
frameworks = ["torch-cpu"]

[[applications]]
name = "cu128"
template = "torch-app"
frameworks = ["torch-cu128"]

[[applications]]
name = "cu128-or-cpu"
template = "torch-app"
# Both the CUDA and non-CUDA frameworks are added to the import path,
# so this app will work as long as *either* of those layers is installed
# If both are available, it uses the CUDA layer (as it is listed first)
# However, the layer locking needs to be told that it is expected that
# the two layers specify different source indexes, and given a conflict,
# the pytorch-cu128 index should be used in preference to pytorch-cpu
index_overrides = { pytorch-cpu = "pytorch-cu128" }
frameworks = ["torch-cu128", "torch-cpu"]


[tool.uv]
# exclude-newer = "2025-10-11T00:00:00Z"
# The custom torch registries do not support exclude-newer,
# and that currently requires avoiding the feature entirely
# https://github.com/astral-sh/uv/issues/12449

[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu/"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128/"
explicit = true

The rules for merging a layer definition with its template before creating the layer specification instance would ensure the layer's entries take priority over the entries provided by the template:

  • for dict (table) fields, an empty dict is updated first with the template's dict (if defined), then with the layer's dict (if defined)
  • for list (array) fields, the two lists are concatenated, with the layer entries appearing first
  • for other field types, the layer entry (if any) overrides the template entry

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions