Skip to content

Commit 6229325

Browse files
RamyaGuruclaude
andcommitted
Number system optimization steps and move summary table upfront
Rename the nine optimization sections to "Step N: <Title>" so readers can track progress, and relocate the summary table to the top of the section as an upfront checklist. Update cross-page anchor links in benchmarking_examples.md and configuration.md to match the new slugs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramya Gurunathan <rgurunathan@nvidia.com>
1 parent aeb7927 commit 6229325

3 files changed

Lines changed: 28 additions & 32 deletions

File tree

docs/tutorials/benchmarking_examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -451,13 +451,13 @@ sudo mlnx_perf -i $if_name
451451
EAL: x hugepages of size x reserved, no mounted hugetlbfs found for that size
452452
```
453453

454-
Ensure your [hugepages are mounted](system_configuration.md#enable-huge-pages).
454+
Ensure your [hugepages are mounted](system_configuration.md#step-4-enable-huge-pages).
455455

456456
```log
457457
EAL: No free x kB hugepages reported on node 0
458458
```
459459

460-
- Ensure you have [allocated hugepages](system_configuration.md#enable-huge-pages).
460+
- Ensure you have [allocated hugepages](system_configuration.md#step-4-enable-huge-pages).
461461
- If you have already, check if they are any free left with `grep Huge /proc/meminfo`.
462462

463463
??? abstract "See an example output"

docs/tutorials/configuration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ bench_tx: # (30)!
112112
15. A descriptive name for that queue, currently only used for logging.
113113
16. The ID of that queue, which can be referred to later in the `flows` section.
114114
17. :material-package-variant: The number of packets per batch (or burst). The Rx path delivers packets to the application in batches of this size. The Tx path should not send more packets than this value per call.
115-
18. :material-wrench: The ID of the CPU core that this queue will use to poll the NIC. Ideally one [isolated core](system_configuration.md#isolate-cpu-cores) per queue. **Must match your system's available cores.**
115+
18. :material-wrench: The ID of the CPU core that this queue will use to poll the NIC. Ideally one [isolated core](system_configuration.md#step-5-isolate-cpu-cores) per queue. **Must match your system's available cores.**
116116
19. The list of memory regions where this queue will write/read packets from/to. The order matters: the first memory region will be used first to read/write from until it fills up one buffer (`buf_size`), after which it will move to the next region in the list and so on until the packet is fully written/read. See the `memory_regions` for the `rx` queue below for an example.
117117
20. The `offloads` section (Tx queues only) lists optional tasks that can be offloaded to the NIC. The only value currently supported is `tx_eth_src`, which lets the NIC insert the ethernet source MAC address in the packet headers. Note: IP, UDP, and Ethernet checksums/CRC are always done by the NIC and are not optional.
118118
21. :material-wrench: Same as for `tx_port`. Each interface in this list should have a unique MAC address. **Must be changed for your system.**

docs/tutorials/system_configuration.md

Lines changed: 25 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ In this tutorial, we will be developing on an **NVIDIA IGX Orin platform** with
1717

1818
!!! Warning "Secure boot conflict"
1919

20-
If you have secure boot enabled on your system, you might need to disable it as a prerequisite to run some of the configurations below ([switching the NIC link layers to Ethernet](#switch-your-nic-link-layers-to-ethernet), [updating the MRRS of your NIC ports](#maximize-the-nics-max-read-request-size-mrrs), [updating the BAR1 size of your GPU](#maximize-gpu-bar1-size)). Secure boot can be re-enabled after the configurations are completed.
20+
If you have secure boot enabled on your system, you might need to disable it as a prerequisite to run some of the configurations below ([switching the NIC link layers to Ethernet](#switch-your-nic-link-layers-to-ethernet), [updating the MRRS of your NIC ports](#step-3-maximize-the-nics-max-read-request-size-mrrs), [updating the BAR1 size of your GPU](#step-8-maximize-gpu-bar1-size)). Secure boot can be re-enabled after the configurations are completed.
2121

2222
### Check your NIC drivers
2323

@@ -361,9 +361,23 @@ If it's not loaded, run the following command, then check again:
361361

362362
While the configurations above are the minimum requirements to get a NIC and a NVIDIA GPU to communicate while bypassing the OS kernel stack, performance can be further improved in most scenarios by tuning the system as described below.
363363

364+
The table below summarizes all optimization steps covered in this section, along with the corresponding `tune_system.py` flags and whether each setting can be made persistent across reboots. Use it as a checklist to track your progress.
365+
366+
| Step | Description | Tuning Script Flag | Persistent Option Available? |
367+
|------|-------------|--------------------|-------------|
368+
| 1 | [PCIe topology](#step-1-ensure-ideal-pcie-topology) | `--check topo` | N/A (hardware) |
369+
| 2 | [PCIe config (MPS/Speed)](#step-2-check-the-nics-pcie-configuration) | `--check mps` | N/A (hardware) |
370+
| 3 | [NIC MRRS](#step-3-maximize-the-nics-max-read-request-size-mrrs) | `--check mrrs` / `--set mrrs` | No — use a startup script |
371+
| 4 | [Hugepages](#step-4-enable-huge-pages) | `--check hugepages` | Yes — kernel bootline or `/etc/fstab` |
372+
| 5 | [CPU isolation](#step-5-isolate-cpu-cores) | `--check cmdline` | Yes — kernel bootline |
373+
| 6 | [CPU governor](#step-6-prevent-cpu-cores-from-going-idle) | `--check cpu-freq` | Yes — see persistent option in section |
374+
| 7 | [GPU clocks](#step-7-prevent-the-gpu-from-going-idle) | `--check gpu-clock` | Partial — `nvidia-smi -pm 1` persists driver; clock locks need a startup script |
375+
| 8 | [GPU BAR1 size](#step-8-maximize-gpu-bar1-size) | `--check bar1-size` | Yes — firmware flash |
376+
| 9 | [Jumbo frames (MTU)](#step-9-enable-jumbo-frames) | `--check mtu` | Yes — see persistent option in section |
377+
364378
!!! tip "Plan your reboots"
365379

366-
Several steps below require adding flags to the kernel bootline in `/etc/default/grub` (hugepages in [Enable Huge pages](#enable-huge-pages), CPU isolation in [Isolate CPU cores](#isolate-cpu-cores)). We recommend reading through both sections first and adding all the flags at once to avoid multiple reboots. Other items like MRRS, GPU clocks, and MTU can be applied at runtime but reset on reboot — consider scripting them or using a systemd service for persistence.
380+
Several steps below require adding flags to the kernel bootline in `/etc/default/grub` (hugepages in [Enable Huge pages](#step-4-enable-huge-pages), CPU isolation in [Isolate CPU cores](#step-5-isolate-cpu-cores)). We recommend reading through both sections first and adding all the flags at once to avoid multiple reboots. Other items like MRRS, GPU clocks, and MTU can be applied at runtime but reset on reboot — consider scripting them or using a systemd service for persistence.
367381

368382
Before diving in each of the setups below, we provide a utility script as part of the DAQIRI library which provides an overview of the configurations that potentially need to be tuned on your system.
369383

@@ -424,7 +438,7 @@ Before diving in each of the setups below, we provide a utility script as part o
424438

425439
Based on the results, you can figure out which of the sections below are appropriate to update configurations on your system.
426440

427-
### Ensure ideal PCIe topology
441+
### Step 1: Ensure ideal PCIe topology
428442

429443
Kernel bypass and GPUDirect rely on PCIe to communicate between the GPU and the NIC at high speeds. As-such, the topology of the PCIe tree on a system is critical to ensure optimal performance.
430444

@@ -513,7 +527,7 @@ lspci -tv
513527

514528
Most x86_64 systems are not designed for this topology as they lack a discrete PCIe switch. In that case, the best connection they can achieve is `NODE`.
515529

516-
### Check the NIC's PCIe configuration
530+
### Step 2: Check the NIC's PCIe configuration
517531

518532
!!! quote "[Understanding PCIe Configuration for Maximum Performance - May 27, 2022](https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance)"
519533

@@ -617,7 +631,7 @@ sudo lspci -vv -s $nic_pci | awk '/LnkCap/{s=1} /LnkSta/{s=0} /Speed /{match($0,
617631
Current Speed 32GT/s
618632
```
619633
620-
### Maximize the NIC's Max Read Request Size (MRRS)
634+
### Step 3: Maximize the NIC's Max Read Request Size (MRRS)
621635

622636
!!! quote "[Understanding PCIe Configuration for Maximum Performance - May 27, 2022](https://enterprise-support.nvidia.com/s/article/understanding-pcie-configuration-for-maximum-performance)"
623637

@@ -687,7 +701,7 @@ Update MRRS:
687701

688702
Disable secure boot on your system ahead of changing the MRRS of your NIC ports. It can be re-enabled afterwards.
689703

690-
### Enable Huge pages
704+
### Step 4: Enable Huge pages
691705

692706
Huge pages are a memory management feature that allows the OS to allocate large blocks of memory (typically 2MB or 1GB) instead of the default 4KB pages. This reduces the number of page table entries and the amount of memory used for translation, improving cache performance and reducing TLB (Translation Lookaside Buffer) misses, which leads to lower latencies.
693707

@@ -846,7 +860,7 @@ The example below allocates 4 huge pages of 1GB each.
846860

847861
Rerunning the initial commands should now list 4 hugepages of 1GB each. 1GB will be the default huge page size if updated in the kernel bootline only.
848862

849-
### Isolate CPU cores
863+
### Step 5: Isolate CPU cores
850864

851865
!!! note
852866

@@ -930,7 +944,7 @@ sudo reboot
930944
931945
Verify that the flags were properly set after boot by rerunning the check commands above.
932946
933-
### Prevent CPU cores from going idle
947+
### Step 6: Prevent CPU cores from going idle
934948
935949
When a core goes idle/to sleep, coming back online to poll the NIC can cause latency spikes and dropped packets. To prevent this, **we recommend setting the scaling governor to `performance` for these CPU cores**.
936950
@@ -1037,7 +1051,7 @@ Set the governor to `performance` for all cores:
10371051
10381052
Running the checks above should now list `performance` as the governor for all cores. You can also run `sudo cpupower -c all frequency-info` for more details.
10391053
1040-
### Prevent the GPU from going idle
1054+
### Step 7: Prevent the GPU from going idle
10411055
10421056
Similarly to the above, we want to maximize the GPU's clock speed and prevent it from going idle.
10431057

@@ -1128,7 +1142,7 @@ You can confirm that the clocks are set to the max values by running `nvidia-smi
11281142
11291143
Some max clocks might not be achievable in certain configurations, or due to boost clocks (SM) or rounding errors (Memory), despite the lock commands indicating it worked. For example - on IGX - the max non-boot SM clock will be 1920 MHz, and the max memory clock will show 8000 MHz, which are satisfying compared to the initial mode.
11301144
1131-
### Maximize GPU BAR1 size
1145+
### Step 8: Maximize GPU BAR1 size
11321146
11331147
The GPU BAR1 memory is the primary resource consumed by `GPUDirect`. It allows other PCIe devices (like the CPU and the NIC) to access the GPU's memory space. The larger the BAR1 size, the more memory the GPU can expose to these devices in a single PCIe transaction, reducing the number of transactions needed and improving performance.
11341148
@@ -1288,7 +1302,7 @@ Reboot your system, and check the BAR1 size again to confirm the change.
12881302
sudo reboot
12891303
```
12901304
1291-
### Enable Jumbo Frames
1305+
### Step 9: Enable Jumbo Frames
12921306
12931307
Jumbo frames are Ethernet frames that carry a payload larger than the standard 1500 bytes MTU (Maximum Transmission Unit). They can significantly improve network performance when transferring large amounts of data by reducing the overhead of packet headers and the number of packets that need to be processed.
12941308
@@ -1368,23 +1382,5 @@ You can set the MTU for each interface like so, for a given `if_name` name ident
13681382
maxmtu 9978
13691383
```
13701384
1371-
### Summary
1372-
1373-
| Step | Description | Tuning Script Flag | Persistent Option Available? |
1374-
|------|-------------|--------------------|-------------|
1375-
| 1 | PCIe topology | `--check topo` | N/A (hardware) |
1376-
| 2 | PCIe config (MPS/Speed) | `--check mps` | N/A (hardware) |
1377-
| 3 | NIC MRRS | `--check mrrs` / `--set mrrs` | No — use a startup script |
1378-
| 4 | Hugepages | `--check hugepages` | Yes — kernel bootline or `/etc/fstab` |
1379-
| 5 | CPU isolation | `--check cmdline` | Yes — kernel bootline |
1380-
| 6 | CPU governor | `--check cpu-freq` | Yes — see persistent option in section |
1381-
| 7 | GPU clocks | `--check gpu-clock` | Partial — `nvidia-smi -pm 1` persists driver; clock locks need a startup script |
1382-
| 8 | GPU BAR1 size | `--check bar1-size` | Yes — firmware flash |
1383-
| 9 | Jumbo frames (MTU) | `--check mtu` | Yes — see persistent option in section |
1384-
1385-
!!! tip
1386-
1387-
Each section above provides both one-time and persistent configuration options. We recommend testing with the one-time commands first, then switching to the persistent options once your configuration is verified. You can check all settings at once with `tune_system.py --check all`.
1388-
13891385
---
13901386
**Next:** [Benchmarking Examples](benchmarking_examples.md) — run your first DAQIRI benchmark

0 commit comments

Comments
 (0)