XDMA poor performance and high latency spikes explained and fixed

In this issue I’ll try to shortly explain the reasons behind XDMA driver poor performance and high latency spikes and how to fix them, effectively increasing throughput for non-RT systems and stabilizing latencies for RT systems.
**_Don’t focus on absolute throughput numbers, I did not care of my calculations correctness for this demo_**. Time values are correct though.

Test system:
iMX8M Mini, Quad core ARM, Artix 7 with XDMA and a BRAM buffer as AXI peripheral. MSI as interrupt, **_RT kernel_**, Yocto Linux.
Important: iMX8M Mini does not support MSI-X in my setup, it also does not support MSI IRQ direction to certain CPU core. The results for Interrupt mode could be different if it did support.

Test procedures:
1.	For throughput test, 100x reads of 8KB is performed, then result is printed and test restarts.
2.	For latency test, 1x read of 384 bytes is performed, then result is printed and test restarts.
3.	Write shows equivalent behavior and thus is excluded from this document.
4.	Legacy interrupt and MSI-X interrupt were not tested.

First, let me show you the test results.

Test results
1.	MSI interrupt
![Image](https://github.com/user-attachments/assets/36270ec1-eb36-4267-a6fd-bd2574bc53b4) 

![Image](https://github.com/user-attachments/assets/55b55943-47ed-41f6-be45-8170e715713f)

2.	poll_mode=1

![Image](https://github.com/user-attachments/assets/0a5f6787-78ad-48b8-b98f-2a4bff524832)

![Image](https://github.com/user-attachments/assets/31b69697-5e10-4e63-b118-0c7777cd11e2)

3.	poll_mode=15 but without proper userspace configuration

![Image](https://github.com/user-attachments/assets/43419457-d688-4140-8489-a40b55d0f2d8)

![Image](https://github.com/user-attachments/assets/7ee19969-9400-4056-b911-0d10ad3f650e)

4.	poll_mode=15 with proper userspace configuration

![Image](https://github.com/user-attachments/assets/0079e662-db9d-45af-a76b-19f67e56c8df)

![Image](https://github.com/user-attachments/assets/c9046596-2fef-408c-af9b-4228f4c1c90e)

![Image](https://github.com/user-attachments/assets/9821f81f-c571-40c4-90e3-0639b9a628b6)


As you can see, MSI interrupt mode was the worst, having performance and latencies very unstable. According to XDMA documentation, switching to poll mode should improve the numbers, and it did, not so much though. Having my fix applied without proper userspace configuration shows marginal improvement, **_that's what would happen if you apply the patch and forget the userspace part_**. Having my fix and proper userspace configuration applied shows incredible result that is many times better than what original MSI interrupt mode was able to offer. Not only it improved and stabilized throughput numbers, it also made XDMA RT-capable, bringing latency down to decent numbers that are also very stable.

The problem is that the original code forces the driver to do a lot of context switching and core migration, which are not only expensive operations (~300us each on my platform) but also a source of latency spikes. Given my SoC has 128 bytes TLP size limitation, it becomes obvious that such a small packet size will lead to often context switching which would destroy the performance.

This is what happens in the original code (cmpl_ is the poll thread, UIC is userspace thread; note how the driver disperses the load across 3 CPU cores, there is a lot of switching and a lot of overhead losses):

![Image](https://github.com/user-attachments/assets/3ba8c571-8284-48be-96cb-09a13f4d2f82)


This is what happens after applying my fix (note that poll thread and userspace thread are on different CPU cores, this is just for this demo, you'll get better results if you have both threads on the same CPU core; note that there is no switching and much less overhead losses):

![Image](https://github.com/user-attachments/assets/83ee1cea-cd59-44b2-b3ca-eb1f62bb4350)

Pictures are not ideal though, take them with a grain of salt. But I hope it made my explanation clear.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

XDMA poor performance and high latency spikes explained and fixed #332

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

XDMA poor performance and high latency spikes explained and fixed #332

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions