Defer Python kernel compilation until invocation#3948
Conversation
|
CUDA Quantum Docs Bot: A preview of the documentation can be found here. |
|
CUDA Quantum Docs Bot: A preview of the documentation can be found here. |
Signed-off-by: Luca Mondada <luca@mondada.net>
|
CUDA Quantum Docs Bot: A preview of the documentation can be found here. |
This was already true. Symbols from EGB scopes were lambda lifted to be arguments to the kernel. This change allows EGB scoped variables to be undefined at the point the kernel is defined. Since the kernel's code is only generated later, the type(s) of these symbol(s) isn't required and no prior declaration is needed. |
I'm not clear on why this is any different than any other kernel call. @cudaq.kernel
def foo():
foo()should record the call to itself as a lambda lifted callable argument when we build the code for Given that, we then must resolve all the lambda lifted arguments. In this example, that's the symbol Even if we inject several layers of calls ( |
First of all, recursion is currently not supported in @cudaq.kernel
def foo():
foo()is undefined at the definition time of That being said, I agree that with this code change, supporting recursion becomes easier. The execution you outline is how it should (and will) work, but the reason it currently does not work is that compilation and execution proceeds as follows:
We need to add a conditional in the resolution of captured arguments that checks whether |
What I meant is that the following previously threw an error @cudaq.kernel
def bar(i: int):
q = cudaq.qvector(n)
n = 3
bar() |
|
CUDA Quantum Docs Bot: A preview of the documentation can be found here. |
bettinaheim
left a comment
There was a problem hiding this comment.
All in all, looks great. I would pull out the change for the behavior of print at least out of this PR - then you also don't have all that noise for test edits.
|
|
Signed-off-by: Luca Mondada <luca@mondada.net>
Signed-off-by: Luca Mondada <luca@mondada.net>
There was a problem hiding this comment.
#3965 exposes a serious issue with the resolution approach that I believe this is proposing. The example in #3965 does not call the kernel outer but rather references it. This exposes the following problem with the approach here. The kernel outer escapes the context of its creation and its content is unprocessed with the deferred compilation. This means that we will have lost the symbol inner and what it is bound to.
We really need to make sure we're not letting outer escape without figuring out what enclosed scope symbols it refers to and what they resolve to at the point the reference escapes.
If you read this carefully, it implies that neither call site nor definition site resolution is sufficient.
Note: Discussed with Luca and resolved my questions.
|
@schweitzpgi #4005 resolves the issue you bring up. It was already an issue before the changes that are proposed here: whilst compilation was working fine, at call time the captured arguments could not be resolved. So, either way, we do not get around keeping a reference to the frame defining the kernel, independently of when we compile it. It's thus unrelated to the changes here. |
schweitzpgi
left a comment
There was a problem hiding this comment.
Where did we land wrt setting the default of deferred to False? Since it still defaults to True I can infer the answer, but I missed the explanation.
Since #4005 is also in play here, we want to have tests that use deferred and not deferred code paths and validation that both cases work correctly. Deferring the instantiation of the IR should be completely orthogonal to tracking contexts to resolving arguments.
|
CUDA Quantum Docs Bot: A preview of the documentation can be found here. |
Signed-off-by: Luca Mondada <luca@mondada.net>
|
CUDA Quantum Docs Bot: A preview of the documentation can be found here. |
Here are the pros and cons of having deferred compilation on by default:
The only disadvantage is temporary and will go away. Being closer to Python semantics is a serious pro.
I've reverted all the test changes.
Yes, I had made that change on Bettina's request. As a result of this, most tests could be reverted to their original. Thanks for spotting that, it's addressed in the latest commit.
Previously, this was done in a function named
I have reverted more tests to
Correct, this PR does not affect how contexts are resolved. Delaying compilation, however, surfaced more of the existing bugs around symbol resolution in the incorrect scope, as compilation was no longer happening in the scope of definition. |
schweitzpgi
left a comment
There was a problem hiding this comment.
Discussed follow-up actions with Luca.
This PR introduces "deferred compilation" in Python. By default, kernels are no longer compiled to MLIR (aka pre-compiled) at kernel definition time, but only when the kernel is invoked for the first time. As before, the MLIR module is then cached for further invocation. For example,
would not trigger any compilation, until
foo()is called for the first time. This enables:import cudaqtakes ~4s now, versus~8sbefore, as the kernels included withcudaqare no longer compiled when the package is loaded. This will also apply to any other library that provides kernel definitions.Compilation can be forced (thus recovering the old behaviour) using the
defer_compilation=Falseflag. The following:triggers compilation immediately.
Limitations
To limit the scope of this PR, I introduced two limitations (see tests for examples):
apply_call(B)if B wasn't compiled beforehand. In that case, the user will receive a clear error suggesting to either usedefer_compilation=Falseor callingB.compile()directly, e.g.