In the preceding sections, we looked at how we can describe data movement between tiles within the AIE-array. However, to do anything useful, we need to get data from outside the array, i.e., from the "host", into the AIE-array and back. On NPU devices, we can achieve this with the operations described in this section.
The operations that will be described in this section must be placed in a separate sequence() of a Runtime class, or aie.runtime_sequence operation at the explicitly placed IRON level. The arguments to this function describe buffers that will be available on the host side; the body of the function describes how those buffers are moved into the AIE-array. Section 3 contains an example.
In high-performance computing applications, efficiently managing data movement and synchronization is crucial. This guide provides a comprehensive overview of how to utilize IRON to manage data movement at runtime from/to host memory to/from the AIE array (for example, in the Ryzen™ AI NPU).
For high-level IRON constructs like RuntimeTasks, please continue with this reading.
For explicitly placed, closer-to-metal IRON API functions like npu_dma_memcpy_nd and dma_wait please continue reading here.