Add scratchpad run-time parameter functionality#3061
Draft
andrej wants to merge 7 commits into
Draft
Conversation
3c87d83 to
dcc57c1
Compare
1f55992 to
ac14b99
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on the XRT/firmware scratchpad feature (#3056) to enable more efficient run-time parameters. The XRT scratchpad supports up to 32 parameters in a "StateTable" that can be written to directly from DDR on the host, copied over by the firmware into a local memory, and from there used to write to registers in the NPU partition.
The primary advantage/difference over the existing run-time parameter implementation is that these parameters are truly read from DDR. They are not hard-coded as write32s in the instruction sequence, and therefore do not require instruction sequence patching or different instruction sequences for different parameter values.
This PR abstracts the XRT scratchpad mechanism to make it more user-friendly through an
aiex.parameterdeclaration (at the module level since parameters can be shared across devices), aaiex.sync_parameters_from_hostoperation that performs the copy from DDR -> command processor memory -> NPU registers/buffers, and aaiex.read_parameteroperation that allows reading the parameters from the core. For the host side, a convenience classScratchpadParameterprovides an abstraction to write the parameters to the correct offset in DDR (no manual tracking of offsets, instead we can use named parameters as declared in the MLIR usingaiex.parameter).The mapping of parameter name to DDR offset/StateTable index (where the host C++ writes to and runtime sequence reads from) is exported as additional metadata to be consumed by the runtime libs in a
params.txtfile. The mapping of parameter name to buffers on NPU cores is kept inline in the MLIR.C++ usage example: NPU-side (MLIR), host-side (C++)
Python usage example: NPU-side (IRON Python), host-side (Python)
There is also another mode to use this where rather than use it as run-time parameters on NPU cores, you can use it to patch DMA BD addresses, C++ example, Python example.
Note: Depends on XRT PR #9813 / XRT PR #9814 for the Python bindings