|
11 | 11 | border-bottom: 2px solid var(--border, rgba(255,255,255,0.12)); |
12 | 12 | } |
13 | 13 | .examples-table th { |
14 | | - padding: 0.6rem 1rem; |
| 14 | + padding: 0.4rem 0.85rem; |
15 | 15 | text-align: left; |
16 | 16 | font-size: 0.75rem; |
17 | 17 | font-weight: 600; |
|
21 | 21 | white-space: nowrap; |
22 | 22 | } |
23 | 23 | .examples-table td { |
24 | | - padding: 0.75rem 1rem; |
| 24 | + padding: 0.45rem 0.85rem; |
25 | 25 | vertical-align: top; |
26 | 26 | border-bottom: 1px solid var(--border, rgba(255,255,255,0.07)); |
27 | 27 | } |
@@ -73,57 +73,48 @@ <h2 class="section-title">Examples</h2> |
73 | 73 | <th>Example</th> |
74 | 74 | <th>Type</th> |
75 | 75 | <th>Description</th> |
76 | | - <th>Source</th> |
77 | 76 | </tr> |
78 | 77 | </thead> |
79 | 78 | <tbody> |
80 | 79 | <tr> |
81 | | - <td class="ex-name">Raw TX/RX GPUDirect</td> |
| 80 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_gpudirect_bench.cpp" target="_blank">Raw TX/RX GPUDirect ↗</a></td> |
82 | 81 | <td><span class="ex-badge ex-badge-cpp">C++</span></td> |
83 | 82 | <td class="ex-desc-cell">The recommended starting point. Sends and receives packets with payloads landing directly in GPU memory, with no CPU in the data path. Use this to validate your hardware setup and measure baseline throughput.</td> |
84 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_gpudirect_bench.cpp" target="_blank">raw_gpudirect_bench.cpp ↗</a></td> |
85 | 83 | </tr> |
86 | 84 | <tr> |
87 | | - <td class="ex-name">Header-Data Split TX/RX</td> |
| 85 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_hds_bench.cpp" target="_blank">Header-Data Split TX/RX ↗</a></td> |
88 | 86 | <td><span class="ex-badge ex-badge-cpp">C++</span></td> |
89 | 87 | <td class="ex-desc-cell">Splits each incoming packet into two segments: headers land in CPU memory for inspection, while the payload goes directly to the GPU. Useful when your application needs to read per-packet metadata without touching the payload on the CPU.</td> |
90 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_hds_bench.cpp" target="_blank">raw_hds_bench.cpp ↗</a></td> |
91 | 88 | </tr> |
92 | 89 | <tr> |
93 | | - <td class="ex-name">Sequence Reorder</td> |
| 90 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_reorder_seq_bench.cpp" target="_blank">Sequence Reorder ↗</a></td> |
94 | 91 | <td><span class="ex-badge ex-badge-cuda">C++/CUDA</span></td> |
95 | 92 | <td class="ex-desc-cell">Reassembles out-of-order UDP packets into a correctly ordered GPU buffer. A CUDA kernel reads the sequence number embedded in each packet header and places the packet at the right position, so downstream compute always sees a clean, ordered stream.</td> |
96 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_reorder_seq_bench.cpp" target="_blank">raw_reorder_seq_bench.cpp ↗</a></td> |
97 | 93 | </tr> |
98 | 94 | <tr> |
99 | | - <td class="ex-name">Sequence Reorder + Quantize</td> |
| 95 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_reorder_quantize_bench.cpp" target="_blank">Sequence Reorder + Quantize ↗</a></td> |
100 | 96 | <td><span class="ex-badge ex-badge-cuda">C++/CUDA</span></td> |
101 | 97 | <td class="ex-desc-cell">Extends sequence reorder with an in-kernel type conversion step (e.g., int4 → fp32), so the GPU buffer is both reordered and in the format your compute pipeline expects — all before your application code runs.</td> |
102 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/raw_reorder_quantize_bench.cpp" target="_blank">raw_reorder_quantize_bench.cpp ↗</a></td> |
103 | 98 | </tr> |
104 | 99 | <tr> |
105 | | - <td class="ex-name">PCAP Writer</td> |
| 100 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/pcap_writer_example.cpp" target="_blank">PCAP Writer ↗</a></td> |
106 | 101 | <td><span class="ex-badge ex-badge-cpp">C++</span></td> |
107 | 102 | <td class="ex-desc-cell">Captures live network traffic to a standard <code>.pcap</code> file you can open in Wireshark or tcpdump. Packets are received via GPUDirect and staged through pinned host memory to disk; capture continues until you press Ctrl+C.</td> |
108 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/pcap_writer_example.cpp" target="_blank">pcap_writer_example.cpp ↗</a></td> |
109 | 103 | </tr> |
110 | 104 | <tr> |
111 | | - <td class="ex-name">RDMA Benchmark</td> |
| 105 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/rdma_bench.cpp" target="_blank">RDMA Benchmark ↗</a></td> |
112 | 106 | <td><span class="ex-badge ex-badge-cpp">C++</span></td> |
113 | 107 | <td class="ex-desc-cell">Measures RoCE/RDMA throughput in client/server mode. Useful for comparing DAQIRI against standard tools like <code>ib_send_bw</code>, or when one endpoint is a third-party RDMA device such as an FPGA or instrument.</td> |
114 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/rdma_bench.cpp" target="_blank">rdma_bench.cpp ↗</a></td> |
115 | 108 | </tr> |
116 | 109 | <tr> |
117 | | - <td class="ex-name">Socket Benchmark</td> |
| 110 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/socket_bench.cpp" target="_blank">Socket Benchmark ↗</a></td> |
118 | 111 | <td><span class="ex-badge ex-badge-cpp">C++</span></td> |
119 | 112 | <td class="ex-desc-cell">Measures TCP and UDP throughput over standard Linux sockets — no ConnectX NIC or special privileges required. A good comparison baseline before moving to kernel-bypass, or for connecting to a peer that only speaks standard sockets.</td> |
120 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/socket_bench.cpp" target="_blank">socket_bench.cpp ↗</a></td> |
121 | 113 | </tr> |
122 | 114 | <tr> |
123 | | - <td class="ex-name">GPUDirect Storage Write</td> |
| 115 | + <td class="ex-name"><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/gds_write_example.cpp" target="_blank">GPUDirect Storage Write ↗</a></td> |
124 | 116 | <td><span class="ex-badge ex-badge-cuda">C++/CUDA</span></td> |
125 | 117 | <td class="ex-desc-cell">Captures a burst of packets and writes them from GPU memory directly to NVMe storage via cuFile, in either raw binary or PCAP format. Supports both synchronous and asynchronous writes, demonstrating the full GPU-to-storage path without any CPU copy.</td> |
126 | | - <td><a class="ex-code-link" href="https://github.com/NVIDIA/daqiri/blob/main/examples/gds_write_example.cpp" target="_blank">gds_write_example.cpp ↗</a></td> |
127 | 118 | </tr> |
128 | 119 | </tbody> |
129 | 120 | </table> |
|
0 commit comments