@@ -151,11 +151,27 @@ validate
151151--------
152152
153153Run memory integrity and bandwidth tests against a board's HBM and DDR
154- subsystems.
154+ subsystems. For each memory path, bandwidth is reported as single-direction
155+ C2H read, single-direction H2C write, and simultaneous bidirectional
156+ throughput (read, write, and total). After the per-memory phases, a final
157+ parallel phase drives HBM and DDR simultaneously with ``2 * N `` buffers for
158+ single-direction tests and ``4 * N `` threads for bidirectional tests; this
159+ phase is skipped when ``--ddr-only `` or ``--hbm-only `` is given.
155160
156161.. code-block :: text
157162
158- v80-smi validate -d <BDF> [-j|--threads <N>]
163+ v80-smi validate -d <BDF> [-j|--threads <N>] [-R|--no-reset] [--mm-channel <spec>] [--buffer-size <size>] [--offset <size>] [--starting-offset <size>] [--raw-transfer-test | --use-qdma-driver] [--ddr-only | --hbm-only] [--channel-allocation <auto|paired>] [--channel-region-stride <size>] [--ring-size-index <0-15>] [--bandwidth-iterations <N>] [--bandwidth-duration <seconds>]
164+
165+ Requirements by mode:
166+
167+ * Default mode uses VRTD buffers, requires a running VRTD daemon, and resets
168+ the board unless ``--no-reset `` is given.
169+ * ``--raw-transfer-test `` bypasses VRTD for transfers and requires the SLASH
170+ QDMA driver device node for the board. It skips reset.
171+ * ``--use-qdma-driver `` bypasses both VRTD and SLASH for transfers and requires
172+ the stock ``qdma-pf `` driver to be bound to the board's QDMA PF. This backend
173+ is built only when ``SMI_ENABLE_QDMA_DRIVER_BACKEND `` is enabled at CMake
174+ configure time.
159175
160176.. option :: -d , --device <BDF >
161177
@@ -164,6 +180,140 @@ subsystems.
164180.. option :: -j , --threads <N >
165181
166182 Number of parallel buffers/threads for the validation test (1–64, default 8).
183+ Bidirectional phases use ``2 * N `` logical positions in each enabled memory
184+ space.
185+
186+ .. option :: --buffer-size <size >
187+
188+ Size of each test buffer. Values may be bare bytes or use ``k ``/``K `` or
189+ ``m ``/``M `` suffixes. The default and maximum are ``512M ``. Values must be
190+ 4 KiB-aligned.
191+
192+ .. option :: --offset <size >
193+
194+ Distance between logical buffer positions. The default is ``512M ``. Values
195+ may be bare bytes or use ``k ``/``K `` or ``m ``/``M `` suffixes, must be
196+ 4 KiB-aligned, and must be at least ``--buffer-size `` so buffers do not
197+ overlap.
198+
199+ .. option :: --starting-offset <size >
200+
201+ Offset from each memory-space base for logical position 0. The default is
202+ ``0 ``. Values may be bare bytes or use ``k ``/``K `` or ``m ``/``M `` suffixes
203+ and must be 4 KiB-aligned.
204+
205+ Buffers are placed at ``memory_base + starting_offset + position * offset ``.
206+ Single-direction phases use positions ``0..N-1 ``. Bidirectional phases use
207+ positions ``0..2N-1 `` with reads on even positions and writes on odd positions.
208+ The full range must remain inside the 64 x 512 MB DDR/HBM address space. If any
209+ placement option is specified in default VRTD mode, ``validate `` uses raw VRTD
210+ buffers so the exact addresses are honored; this requires raw memory access
211+ permission.
212+
213+ The largest phase maps up to ``4 * N * buffer-size `` of host buffers when both
214+ HBM and DDR are enabled, or ``2 * N * buffer-size `` with ``--ddr-only `` or
215+ ``--hbm-only ``; the command fails early if that exceeds currently available
216+ host memory.
217+
218+ .. option :: -R , --no-reset
219+
220+ Skip the device reset step before running memory tests.
221+
222+ .. option :: --mm-channel <spec >
223+
224+ AXI-MM / NoC channel selection for each buffer's QDMA queue pair, in every
225+ mode. ``spec `` is either a single value applied to all buffers, or a
226+ comma-separated list giving one channel per logical buffer position
227+ (exactly ``2 x --threads `` entries; there is no repeating/wrap, and any
228+ other length is an error):
229+
230+ * ``auto `` (the default) lets the driver stripe queues across both channels
231+ by ``qid & 1 ``.
232+ * ``0 `` / ``1 `` pin the queue to that AXI-MM channel (and hence NoC channel).
233+ * e.g. with ``-j 1 `` the list ``0,1 `` puts buffer position 0 on channel 0 and
234+ position 1 on channel 1. Bidirectional phases use positions ``0..2N-1 ``;
235+ single-direction phases use the first ``N `` entries.
236+
237+ This is independent of ``--channel-allocation `` (which controls the device
238+ address): ``--mm-channel `` controls the host-side NoC ingress (NMU) per
239+ queue. With ``--use-qdma-driver `` the selection maps to the stock driver's
240+ per-queue MM-channel attribute.
241+
242+ .. option :: --raw-transfer-test
243+
244+ Use libslash raw QDMA transfers instead of VRTD buffers. This mode implies
245+ ``--no-reset `` and requires the SLASH QDMA driver device to be present.
246+
247+ .. option :: --use-qdma-driver
248+
249+ Run the raw transfer test over the off-the-shelf Xilinx QDMA driver
250+ (``/dev/qdma<idx>-MM-<qid> ``) instead of SLASH. smi provisions the queues
251+ itself: it raises the function's ``qmax `` via sysfs if needed, creates and
252+ starts bidirectional AXI-MM queue pairs over generic netlink (the same
253+ ``xnl_pf `` interface ``dma-ctl `` uses), then transfers over the per-queue
254+ char devices. Queue pairs are spread round-robin across the function's MM
255+ engine channels (``channel = qid % mm_channel_max ``); the CPM5 QDMA on the
256+ V80 exposes two, so the test exercises both. This mode implies
257+ ``--no-reset `` and is mutually exclusive with ``--raw-transfer-test ``. It
258+ requires the stock ``qdma-pf `` driver to be bound to the board's PF (it
259+ cannot be bound at the same time as the SLASH driver), and typically
260+ requires root to raise ``qmax `` and open the queue devices.
261+
262+ .. option :: --ddr-only
263+
264+ Run only the DDR memory tests and skip the HBM phase. Mutually exclusive
265+ with ``--hbm-only ``.
266+
267+ .. option :: --hbm-only
268+
269+ Run only the HBM memory tests and skip the DDR phase. Mutually exclusive
270+ with ``--ddr-only ``.
271+
272+ .. option :: --channel-allocation <auto|paired >
273+
274+ Raw-transfer-only (``--raw-transfer-test `` or ``--use-qdma-driver ``) control
275+ over how QDMA MM/NoC channels map onto device memory. On CPM5 the host-side
276+ NoC ingress port (NMU) is chosen per queue by the SW-context
277+ mm-channel/host_id (SLASH uses ``qid & 1 ``), while the memory-side NoC egress
278+ endpoint (NSU / pseudo-channel) is chosen by the device address. Default
279+ ``auto `` keeps the historical behaviour: channel ``qid & 1 `` with linear
280+ addressing, so both NMUs can converge on a single NSU and bandwidth caps at
281+ one path. ``paired `` couples the two: even positions land in memory region 0
282+ on channel 0, odd positions in region 1 on channel 1 (one
283+ ``--channel-region-stride `` apart), giving two independent NMU->NSU paths.
284+ This mirrors the off-the-shelf ``dma-perf `` ``offset_ch0 ``/``offset_ch1 ``
285+ knobs and is the placement that lets both NoC ports contribute bandwidth.
286+
287+ .. option :: --channel-region-stride <size >
288+
289+ In ``--channel-allocation paired `` mode, the byte distance between the two
290+ per-channel memory regions (the NSU / pseudo-channel stride). Default ``16G ``
291+ (== half the per-memory address space, matching the dma-perf HBM
292+ ``offset_ch1 - offset_ch0 `` spacing). Must be a non-zero multiple of 4 KiB.
293+ Accepts bare bytes or ``k ``/``K ``, ``m ``/``M ``, ``g ``/``G `` suffixes.
294+
295+ .. option :: --ring-size-index <0-15 >
296+
297+ Raw-transfer-only (``--raw-transfer-test `` or ``--use-qdma-driver ``).
298+ Override the QDMA descriptor-ring size index used when creating SLASH raw
299+ queue pairs or starting stock-driver queues. When omitted, each backend keeps
300+ its existing default. Useful A/B values for 4 KiB descriptor throughput are
301+ ``0 ``, ``11 ``, ``13 ``, and ``15 ``.
302+
303+ .. option :: --bandwidth-iterations <N >
304+
305+ Raw-transfer-only (``--raw-transfer-test `` or ``--use-qdma-driver ``). Repeat
306+ each whole-buffer transfer in every bandwidth phase ``N `` times and report
307+ bandwidth over the sustained loop. The default is ``1 ``, which preserves the
308+ historical one-shot measurement.
309+
310+ .. option :: --bandwidth-duration <seconds >
311+
312+ Raw-transfer-only duration mode. When non-zero, each bandwidth phase repeats
313+ whole-buffer transfers until the requested wall-clock duration has elapsed
314+ and counts only completed transfers. This is useful for comparing SLASH's raw
315+ path against long-running tools such as ``dma-perf ``. A value of ``0 `` uses
316+ ``--bandwidth-iterations `` instead.
167317
168318debug
169319-----
0 commit comments