Defer Python kernel compilation until invocation by lmondada · Pull Request #3948 · NVIDIA/cuda-quantum

lmondada · 2026-02-13T10:41:55Z

This PR introduces "deferred compilation" in Python. By default, kernels are no longer compiled to MLIR (aka pre-compiled) at kernel definition time, but only when the kernel is invoked for the first time. As before, the MLIR module is then cached for further invocation. For example,

import cudaq

@cudaq.kernel
def foo():
    pass

would not trigger any compilation, until foo() is called for the first time. This enables:

Faster package load times: import cudaq takes ~4s now, versus ~8s before, as the kernels included with cudaq are no longer compiled when the package is loaded. This will also apply to any other library that provides kernel definitions.
Kernels can be defined out-of-order: previously, calling kernel B from kernel A would error if B was defined after A. This is now supported (see tests).
Captured variables that aren't defined at kernel definition time but are defined at invocation time are now supported.

Compilation can be forced (thus recovering the old behaviour) using the defer_compilation=False flag. The following:

@cudaq.kernel(defer_compilation=False)
def foo():
    pass

triggers compilation immediately.

Limitations

To limit the scope of this PR, I introduced two limitations (see tests for examples):

Recursive kernel calls are still not supported. The way kernels are captured and lifted into arguments would have to handle recursive calls specially.
A kernel builder A cannot call another kernel B using apply_call(B) if B wasn't compiled beforehand. In that case, the user will receive a clear error suggesting to either use defer_compilation=False or calling B.compile() directly, e.g.

@cudaq.kernel(defer_compilation=True)
def notPrecompiledKernel():
    pass

kernel = cudaq.make_kernel()
kernel.apply_call(notPrecompiledKernel)  # fails. User must set `defer_compilation=False`
                                         # or call `notPrecompiledKernel.compile()`

github-actions · 2026-02-16T18:20:32Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

python/cudaq/kernel/ast_bridge.py

python/tests/interop/test_interop.py

python/tests/mlir/ast_attributes.py

python/tests/kernel/test_assignments.py

github-actions · 2026-02-17T12:56:30Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

Signed-off-by: Luca Mondada <luca@mondada.net>

github-actions · 2026-02-18T12:01:19Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

schweitzpgi · 2026-02-18T18:06:44Z

Captured variables that aren't defined at kernel definition time but are defined at invocation time are now supported.

This was already true. Symbols from EGB scopes were lambda lifted to be arguments to the kernel. This change allows EGB scoped variables to be undefined at the point the kernel is defined. Since the kernel's code is only generated later, the type(s) of these symbol(s) isn't required and no prior declaration is needed.

schweitzpgi · 2026-02-18T18:13:47Z

Recursive kernel calls are still not supported. The way kernels are captured and lifted into arguments would have to handle recursive calls specially.

I'm not clear on why this is any different than any other kernel call.

@cudaq.kernel
def foo():
  foo()

should record the call to itself as a lambda lifted callable argument when we build the code for foo in the AST bridge step. This is required to come first since we'd have no idea what symbols appear in the body of foo if we tried to resolve those symbols before building the kernel code.

Given that, we then must resolve all the lambda lifted arguments. In this example, that's the symbol foo. That symbol must resolve to the kernel decorator we're building which is bound to the same symbol. heh. But we just built that code, so there is nothing to do.

Even if we inject several layers of calls ($\ge 1$) in the middle, foo is foo and the recursion naturally terminates here.

python/cudaq/kernel/ast_bridge.py

python/cudaq/kernel/kernel_decorator.py

python/tests/kernel/test_kernel_features.py

python/cudaq/kernel/kernel_decorator.py

lmondada · 2026-02-19T10:05:55Z

I'm not clear on why this is any different than any other kernel call.
[...]
Given that, we then must resolve all the lambda lifted arguments. In this example, that's the symbol foo. That symbol must resolve to the kernel decorator we're building which is bound to the same symbol. heh. But we just built that code, so there is nothing to do.

First of all, recursion is currently not supported in main. This is because the symbol foo within the kernel definition:

@cudaq.kernel
def foo():
    foo()

is undefined at the definition time of foo (which is when it currently gets AOT compiled). Similarly, foo calling bar calling foo (or any other recursive setup) would not compile, given that one of the kernels would have to reference a not-yet-defined kernel.

That being said, I agree that with this code change, supporting recursion becomes easier. The execution you outline is how it should (and will) work, but the reason it currently does not work is that compilation and execution proceeds as follows:

Compile foo.
Find a reference to foo within the body. Lambda lift that into an argument. Now foo takes a (captured) callable argument.
Compiling completes successfully
Kernel foo is invoked, so the captured arguments must be resolved. Captured foo is resolved to the same decorator as being invoked.
Now we need to (recursively) resolve the captured arguments of captured foo. Here foo gets resolved again to the decorator... this is how we get stuck in a loop of repeatedly capturing foo.

We need to add a conditional in the resolution of captured arguments that checks whether foo has been resolved before (and keep a list of all resolved callables somewhere), so that we can resolve it without recursing. I've already agreed with Bettina that I will work on a PR for this next.

lmondada · 2026-02-19T10:08:42Z

Captured variables that aren't defined at kernel definition time but are defined at invocation time are now supported.

This was already true. Symbols from EGB scopes were lambda lifted to be arguments to the kernel. This change allows EGB scoped variables to be undefined at the point the kernel is defined. Since the kernel's code is only generated later, the type(s) of these symbol(s) isn't required and no prior declaration is needed.

What I meant is that the following previously threw an error

@cudaq.kernel
def bar(i: int):
    q = cudaq.qvector(n)

n = 3
    
bar()

github-actions · 2026-02-19T17:01:02Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

bettinaheim

All in all, looks great. I would pull out the change for the behavior of print at least out of this PR - then you also don't have all that noise for test edits.

python/cudaq/kernel/ast_bridge.py

python/cudaq/kernel/kernel_decorator.py

python/tests/builder/test_kernel_builder.py

python/tests/kernel/test_kernel_features.py

python/tests/mlir/ast_break.py

python/tests/kernel/test_deferred_compilation.py

schweitzpgi · 2026-02-24T16:47:18Z

~~Is this a new PR? I thought I already reviewed something like this and left comments.~~
Nevermind. The GUI appears to be confused as I see my comments are still here.

Signed-off-by: Luca Mondada <luca@mondada.net>

schweitzpgi

#3965 exposes a serious issue with the resolution approach that I believe this is proposing. The example in #3965 does not call the kernel outer but rather references it. This exposes the following problem with the approach here. The kernel outer escapes the context of its creation and its content is unprocessed with the deferred compilation. This means that we will have lost the symbol inner and what it is bound to.

We really need to make sure we're not letting outer escape without figuring out what enclosed scope symbols it refers to and what they resolve to at the point the reference escapes.

If you read this carefully, it implies that neither call site nor definition site resolution is sufficient.

Note: Discussed with Luca and resolved my questions.

lmondada · 2026-02-24T17:38:34Z

@schweitzpgi #4005 resolves the issue you bring up.

It was already an issue before the changes that are proposed here: whilst compilation was working fine, at call time the captured arguments could not be resolved. So, either way, we do not get around keeping a reference to the frame defining the kernel, independently of when we compile it.

It's thus unrelated to the changes here.

schweitzpgi

Where did we land wrt setting the default of deferred to False? Since it still defaults to True I can infer the answer, but I missed the explanation.

Since #4005 is also in play here, we want to have tests that use deferred and not deferred code paths and validation that both cases work correctly. Deferring the instantiation of the IR should be completely orthogonal to tracking contexts to resolving arguments.

python/cudaq/kernel/ast_bridge.py

python/cudaq/kernel/kernel_decorator.py

python/tests/kernel/test_assignments.py

python/tests/kernel/test_kernel_features.py

github-actions · 2026-02-25T10:07:24Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

Signed-off-by: Luca Mondada <luca@mondada.net>

github-actions · 2026-02-25T12:10:51Z

CUDA Quantum Docs Bot: A preview of the documentation can be found here.

lmondada · 2026-02-25T14:08:10Z

Where did we land wrt setting the default of deferred to False? Since it still defaults to True I can infer the answer, but I missed the explanation.

Here are the pros and cons of having deferred compilation on by default:

pro: speeds up loading of scripts, libraries etc that define kernels but might not use them immediately
pro: allows out-of-order definition, which is the correct Python semantics.
pro: easier support for recursion (it works naturally when deferring compilation, given that the kernel reference exists by the time it is getting compiled)
con: does not work with kernel builder. This is temporary and will be fixed by lambda lifting.

The only disadvantage is temporary and will go away. Being closer to Python semantics is a serious pro.

I'm not a huge fan of reverting all the test changes. None of the existing tests require delayed compilation, so why reintroduce that?

I've reverted all the test changes.

I think printing the kernel is now compiling it.

Yes, I had made that change on Bettina's request. As a result of this, most tests could be reverted to their original. Thanks for spotting that, it's addressed in the latest commit.

I think compile is misleading. What this is really doing is building the IR, if it doesn't yet exist. Maybe we should call it build_ir or instantiate_ir for clarity?

Previously, this was done in a function named pre_compile, but a compile function existed alongside it that was a no-op. I've consolidated both into one but agree that compile is ambiguous. I ask for permission to resolve this in a separate PR as getting rid of compile altogether will change a lot of unrelated code (see e.g. the uses in python/tests/kernel/test_kernel_features.py)

We want to have tests that use deferred and not deferred code paths and validation that both cases work correctly

I have reverted more tests to defer_compilation=False, so there should be a good mix of both now. If you want to go beyond this, we could parametrise many tests in python/tests/kernel/test_assignments.py and elsewhere on whether they should defer compilation or not. We could then run all tests twice, once for each setting. However, this gets expensive quickly. Can we explore the options in a separate PR?

Deferring the instantiation of the IR should be completely orthogonal to tracking contexts to resolving arguments.

Correct, this PR does not affect how contexts are resolved. Delaying compilation, however, surfaced more of the existing bugs around symbol resolution in the incorrect scope, as compilation was no longer happening in the scope of definition.

schweitzpgi

Discussed follow-up actions with Luca.

copy-pr-bot bot temporarily deployed to ghcr-ci February 13, 2026 10:42 Inactive

lmondada changed the title ~~[wip~~ [wip] [python] Delay MLIR compilation until kernel invocation Feb 13, 2026

copy-pr-bot bot temporarily deployed to ghcr-ci February 13, 2026 10:42 Inactive

copy-pr-bot bot had a problem deploying to ghcr-ci February 13, 2026 10:42 Error

copy-pr-bot bot temporarily deployed to ghcr-ci February 13, 2026 10:42 Inactive

copy-pr-bot bot had a problem deploying to ghcr-ci February 13, 2026 10:42 Error

copy-pr-bot bot temporarily deployed to ghcr-ci February 13, 2026 10:42 Inactive

lmondada commented Feb 17, 2026

View reviewed changes

python/cudaq/kernel/ast_bridge.py Show resolved Hide resolved

python/tests/interop/test_interop.py Show resolved Hide resolved

python/tests/mlir/ast_attributes.py Outdated Show resolved Hide resolved

python/tests/kernel/test_assignments.py Show resolved Hide resolved

[python] Delay MLIR compilation until kernel invocation

dcfe6d4

Signed-off-by: Luca Mondada <luca@mondada.net>

schweitzpgi reviewed Feb 18, 2026

View reviewed changes

python/cudaq/kernel/ast_bridge.py Show resolved Hide resolved

python/cudaq/kernel/kernel_decorator.py Show resolved Hide resolved

python/cudaq/kernel/kernel_decorator.py Show resolved Hide resolved

schweitzpgi reviewed Feb 18, 2026

View reviewed changes

python/tests/kernel/test_kernel_features.py Outdated Show resolved Hide resolved

schweitzpgi reviewed Feb 18, 2026

View reviewed changes

python/cudaq/kernel/kernel_decorator.py Show resolved Hide resolved

lmondada mentioned this pull request Feb 19, 2026

Fix bug with calling registered C++ kernels multiple times in python kernels #4004

Merged

Merge branch 'main' into lm/aot-cache

5312ec4

bettinaheim approved these changes Feb 24, 2026

View reviewed changes

lmondada added 3 commits February 24, 2026 17:01

Merge branch 'main' into lm/aot-cache

5e43374

revert changes to printing

32aa733

Signed-off-by: Luca Mondada <luca@mondada.net>

add more asserts for error msg

3753090

Signed-off-by: Luca Mondada <luca@mondada.net>

schweitzpgi mentioned this pull request Feb 24, 2026

[python] [bug] When invoked, kernel resolves captured arguments in calling scope #3965

Closed

4 tasks

schweitzpgi previously requested changes Feb 24, 2026

View reviewed changes

schweitzpgi reviewed Feb 24, 2026

View reviewed changes

Merge branch 'main' into lm/aot-cache

50ae11c

revert test changes

a25d7a5

Signed-off-by: Luca Mondada <luca@mondada.net>

schweitzpgi approved these changes Feb 25, 2026

View reviewed changes

Conversation

lmondada commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitations

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

schweitzpgi commented Feb 18, 2026

Uh oh!

schweitzpgi commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lmondada commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmondada commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

bettinaheim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

schweitzpgi commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schweitzpgi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lmondada commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schweitzpgi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

lmondada commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schweitzpgi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

lmondada commented Feb 13, 2026 •

edited

Loading

schweitzpgi commented Feb 18, 2026 •

edited

Loading

lmondada commented Feb 19, 2026 •

edited

Loading

lmondada commented Feb 19, 2026 •

edited

Loading

schweitzpgi commented Feb 24, 2026 •

edited

Loading

schweitzpgi left a comment •

edited

Loading

lmondada commented Feb 24, 2026 •

edited

Loading

lmondada commented Feb 25, 2026 •

edited

Loading