You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Number system optimization steps and move summary table upfront
Rename the nine optimization sections to "Step N: <Title>" so readers can track progress, and relocate the summary table to the top of the section as an upfront checklist. Update cross-page anchor links in benchmarking_examples.md and configuration.md to match the new slugs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Ramya Gurunathan <rgurunathan@nvidia.com>
Copy file name to clipboardExpand all lines: docs/tutorials/configuration.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ bench_tx: # (30)!
112
112
15. A descriptive name for that queue, currently only used for logging.
113
113
16. The ID of that queue, which can be referred to later in the `flows` section.
114
114
17. :material-package-variant: The number of packets per batch (or burst). The Rx path delivers packets to the application in batches of this size. The Tx path should not send more packets than this value per call.
115
-
18. :material-wrench: The ID of the CPU core that this queue will use to poll the NIC. Ideally one [isolated core](system_configuration.md#isolate-cpu-cores) per queue. **Must match your system's available cores.**
115
+
18. :material-wrench: The ID of the CPU core that this queue will use to poll the NIC. Ideally one [isolated core](system_configuration.md#step-5-isolate-cpu-cores) per queue. **Must match your system's available cores.**
116
116
19. The list of memory regions where this queue will write/read packets from/to. The order matters: the first memory region will be used first to read/write from until it fills up one buffer (`buf_size`), after which it will move to the next region in the list and so on until the packet is fully written/read. See the `memory_regions` for the `rx` queue below for an example.
117
117
20. The `offloads` section (Tx queues only) lists optional tasks that can be offloaded to the NIC. The only value currently supported is `tx_eth_src`, which lets the NIC insert the ethernet source MAC address in the packet headers. Note: IP, UDP, and Ethernet checksums/CRC are always done by the NIC and are not optional.
118
118
21. :material-wrench: Same as for `tx_port`. Each interface in this list should have a unique MAC address. **Must be changed for your system.**
Copy file name to clipboardExpand all lines: docs/tutorials/system_configuration.md
+25-29Lines changed: 25 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ In this tutorial, we will be developing on an **NVIDIA IGX Orin platform** with
17
17
18
18
!!! Warning "Secure boot conflict"
19
19
20
-
If you have secure boot enabled on your system, you might need to disable it as a prerequisite to run some of the configurations below ([switching the NIC link layers to Ethernet](#switch-your-nic-link-layers-to-ethernet), [updating the MRRS of your NIC ports](#maximize-the-nics-max-read-request-size-mrrs), [updating the BAR1 size of your GPU](#maximize-gpu-bar1-size)). Secure boot can be re-enabled after the configurations are completed.
20
+
If you have secure boot enabled on your system, you might need to disable it as a prerequisite to run some of the configurations below ([switching the NIC link layers to Ethernet](#switch-your-nic-link-layers-to-ethernet), [updating the MRRS of your NIC ports](#step-3-maximize-the-nics-max-read-request-size-mrrs), [updating the BAR1 size of your GPU](#step-8-maximize-gpu-bar1-size)). Secure boot can be re-enabled after the configurations are completed.
21
21
22
22
### Check your NIC drivers
23
23
@@ -361,9 +361,23 @@ If it's not loaded, run the following command, then check again:
361
361
362
362
While the configurations above are the minimum requirements to get a NIC and a NVIDIA GPU to communicate while bypassing the OS kernel stack, performance can be further improved in most scenarios by tuning the system as described below.
363
363
364
+
The table below summarizes all optimization steps covered in this section, along with the corresponding `tune_system.py` flags and whether each setting can be made persistent across reboots. Use it as a checklist to track your progress.
| 9 | [Jumbo frames (MTU)](#step-9-enable-jumbo-frames) | `--check mtu` | Yes — see persistent option in section |
377
+
364
378
!!! tip "Plan your reboots"
365
379
366
-
Several steps below require adding flags to the kernel bootline in`/etc/default/grub` (hugepages in [Enable Huge pages](#enable-huge-pages), CPU isolation in [Isolate CPU cores](#isolate-cpu-cores)). We recommend reading through both sections first and adding all the flags at once to avoid multiple reboots. Other items like MRRS, GPU clocks, and MTU can be applied at runtime but reset on reboot — consider scripting them or using a systemd service for persistence.
380
+
Several steps below require adding flags to the kernel bootline in`/etc/default/grub` (hugepages in [Enable Huge pages](#step-4-enable-huge-pages), CPU isolation in [Isolate CPU cores](#step-5-isolate-cpu-cores)). We recommend reading through both sections first and adding all the flags at once to avoid multiple reboots. Other items like MRRS, GPU clocks, and MTU can be applied at runtime but reset on reboot — consider scripting them or using a systemd service for persistence.
367
381
368
382
Before diving in each of the setups below, we provide a utility script as part of the DAQIRI library which provides an overview of the configurations that potentially need to be tuned on your system.
369
383
@@ -424,7 +438,7 @@ Before diving in each of the setups below, we provide a utility script as part o
424
438
425
439
Based on the results, you can figure out which of the sections below are appropriate to update configurations on your system.
426
440
427
-
### Ensure ideal PCIe topology
441
+
### Step 1: Ensure ideal PCIe topology
428
442
429
443
Kernel bypass and GPUDirect rely on PCIe to communicate between the GPU and the NIC at high speeds. As-such, the topology of the PCIe tree on a system is critical to ensure optimal performance.
430
444
@@ -513,7 +527,7 @@ lspci -tv
513
527
514
528
Most x86_64 systems are not designed for this topology as they lack a discrete PCIe switch. In that case, the best connection they can achieve is `NODE`.
515
529
516
-
### Check the NIC's PCIe configuration
530
+
### Step 2: Check the NIC's PCIe configuration
517
531
518
532
!!! quote "[Understanding PCIe Configuration for Maximum Performance - May 27, 2022](https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance)"
### Maximize the NIC's Max Read Request Size (MRRS)
634
+
### Step 3: Maximize the NIC's Max Read Request Size (MRRS)
621
635
622
636
!!! quote "[Understanding PCIe Configuration for Maximum Performance - May 27, 2022](https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance)"
623
637
@@ -687,7 +701,7 @@ Update MRRS:
687
701
688
702
Disable secure boot on your system ahead of changing the MRRS of your NIC ports. It can be re-enabled afterwards.
689
703
690
-
### Enable Huge pages
704
+
### Step 4: Enable Huge pages
691
705
692
706
Huge pages are a memory management feature that allows the OS to allocate large blocks of memory (typically 2MB or 1GB) instead of the default 4KB pages. This reduces the number of page table entries and the amount of memory used for translation, improving cache performance and reducing TLB (Translation Lookaside Buffer) misses, which leads to lower latencies.
693
707
@@ -846,7 +860,7 @@ The example below allocates 4 huge pages of 1GB each.
846
860
847
861
Rerunning the initial commands should now list 4 hugepages of 1GB each. 1GB will be the default huge page size if updated in the kernel bootline only.
848
862
849
-
### Isolate CPU cores
863
+
### Step 5: Isolate CPU cores
850
864
851
865
!!! note
852
866
@@ -930,7 +944,7 @@ sudo reboot
930
944
931
945
Verify that the flags were properly set after boot by rerunning the check commands above.
932
946
933
-
### Prevent CPU cores from going idle
947
+
### Step 6: Prevent CPU cores from going idle
934
948
935
949
When a core goes idle/to sleep, coming back online to poll the NIC can cause latency spikes and dropped packets. To prevent this, **we recommend setting the scaling governor to `performance` for these CPU cores**.
936
950
@@ -1037,7 +1051,7 @@ Set the governor to `performance` for all cores:
1037
1051
1038
1052
Running the checks above should now list `performance` as the governor for all cores. You can also run `sudo cpupower -c all frequency-info` for more details.
1039
1053
1040
-
### Prevent the GPU from going idle
1054
+
### Step 7: Prevent the GPU from going idle
1041
1055
1042
1056
Similarly to the above, we want to maximize the GPU's clock speed and prevent it from going idle.
1043
1057
@@ -1128,7 +1142,7 @@ You can confirm that the clocks are set to the max values by running `nvidia-smi
1128
1142
1129
1143
Some max clocks might not be achievable in certain configurations, or due to boost clocks (SM) or rounding errors (Memory), despite the lock commands indicating it worked. For example - on IGX - the max non-boot SM clock will be 1920 MHz, and the max memory clock will show 8000 MHz, which are satisfying compared to the initial mode.
1130
1144
1131
-
### Maximize GPU BAR1 size
1145
+
### Step 8: Maximize GPU BAR1 size
1132
1146
1133
1147
The GPU BAR1 memory is the primary resource consumed by `GPUDirect`. It allows other PCIe devices (like the CPU and the NIC) to access the GPU's memory space. The larger the BAR1 size, the more memory the GPU can expose to these devices in a single PCIe transaction, reducing the number of transactions needed and improving performance.
1134
1148
@@ -1288,7 +1302,7 @@ Reboot your system, and check the BAR1 size again to confirm the change.
1288
1302
sudo reboot
1289
1303
```
1290
1304
1291
-
### Enable Jumbo Frames
1305
+
### Step 9: Enable Jumbo Frames
1292
1306
1293
1307
Jumbo frames are Ethernet frames that carry a payload larger than the standard 1500 bytes MTU (Maximum Transmission Unit). They can significantly improve network performance when transferring large amounts of data by reducing the overhead of packet headers and the number of packets that need to be processed.
1294
1308
@@ -1368,23 +1382,5 @@ You can set the MTU for each interface like so, for a given `if_name` name ident
| 9 | Jumbo frames (MTU) |`--check mtu`| Yes — see persistent option in section |
1384
-
1385
-
!!! tip
1386
-
1387
-
Each section above provides both one-time and persistent configuration options. We recommend testing with the one-time commands first, then switching to the persistent options once your configuration is verified. You can check all settings at once with `tune_system.py --check all`.
1388
-
1389
1385
---
1390
1386
**Next:** [Benchmarking Examples](benchmarking_examples.md) — run your first DAQIRI benchmark
0 commit comments