Replies: 1 comment
-
|
Can you please include the test code used to generate these results? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
A simple pass through test case is implemented where 256kilobytes moves from L3 -> shimDMA(0,0) -> Core(0,2) and back in chunks of 4096 bytes. The data is verified after pass through. Trace is enabled on both Core(0,2) and shim tile shimDMA(0,0).

Pass through code
v64uint8 *restrict outPtr = (v64uint8 *)out;
v64uint8 *restrict inPtr = (v64uint8 *)in;
AIE_PREPARE_FOR_PIPELINING
AIE_LOOP_MIN_ITERATION_COUNT(6)
for (int j = 0; j < (height * width); j += N) // Nx samples per loop
{
*outPtr++ = *inPtr++;
}
Observations from the trace "to transfer a block of 4096 bytes"

Core(0,2) Lock stall duration ~ 850us

Core(0,2) in running state duration ~ 155us
Overall NPU execution time is around 650us.
Concerns
Beta Was this translation helpful? Give feedback.
All reactions