Skip to content

Commit 2210b23

Browse files
committed
Bug report srcs
1 parent bbc26c2 commit 2210b23

38 files changed

Lines changed: 45444 additions & 1 deletion

.gitattributes

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*.dcp filter=lfs diff=lfs merge=lfs -text
2+
*.bit filter=lfs diff=lfs merge=lfs -text
3+
bug-report/checkpoints/2023.2/shell_routed_23_2.dcp filter=lfs diff=lfs merge=lfs -text
4+
bug-report/checkpoints/2025.2/shell_routed_25_2.dcp filter=lfs diff=lfs merge=lfs -text

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,4 +50,5 @@ misc/
5050
*.venv
5151
__pycache__
5252
docs/docs-sw/
53-
docs/docs-driver/
53+
docs/docs-driver/
54+
solution1/

bug-report/README.md

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Bug Report: `reduce_ops` HLS Kernel Fails to Produce Output in Vivado 2024.2+
2+
3+
## Table of Contents
4+
5+
1. [Overview](#overview)
6+
2. [Module Description](#module-description)
7+
3. [HLS Source](#hls-source)
8+
4. [Hardware Observation — ILA Capture](#hardware-observation--ila-capture)
9+
5. [Reproducing the Bug](#reproducing-the-bug)
10+
6. [Folder Structure](#folder-structure)
11+
12+
---
13+
14+
## Overview
15+
16+
This repository documents a regression in Vitis HLS / Vivado. The `reduce_ops` kernel — a `do-while` loop with `#pragma HLS PIPELINE II=1 style=frp` and an `ap_ctrl_none` control interface — receives valid data on both AXI-Stream inputs but **never asserts `TVALID` on its output**, effectively hanging the datapath. The root cause has not been isolated (candidate constructs are `style=frp`, the `do-while` loop termination condition, `ap_ctrl_none`, or a combination thereof). The failure is silent — synthesis, implementation and DRC all complete without errors or relevant warnings.
17+
18+
- **Working**: Vitis HLS / Vivado ≤ 2024.1
19+
- **Broken**: Vitis HLS / Vivado ≥ 2024.2 (confirmed also in latest version of Vivado/Vitis HLS 2025.2)
20+
21+
Importantly, **C simulation (`csim`) and co-simulation (`cosim`) both pass** in all tested tool versions, making the regression invisible without hardware testing.
22+
23+
This repository contains source code, logs, ILA captures, design checkpoints from Vivado 2023.2 (working) and Vivado 2025.2 (the latest available Vivado version, for which the code doesn't work).
24+
25+
**N.B.:** This bug was first observed in ACCL: https://github.com/Xilinx/ACCL. Since building ACCL is generally more involved (larger design --> longer synthesis, harder timing closure, no support for Vitis 2025.x due to the migration to vitis-run etc.), the problematic module was isolated and added as an application to Coyote, the open-source FPGA shell. This folder contains the minimum working example, containing the problematic HLS code, the Verilog wrapper, a testbench and some C++ code to run the test. All the relavant details are explained in this README.
26+
27+
---
28+
29+
## Module Description
30+
31+
The kernel under test is `reduce_ops`, taken from the open-source [ACCL project](https://github.com/Xilinx/ACCL). It performs element-wise reduction (addition or max) over two 512-bit AXI-Stream operands, selecting the data type from the `TDEST` field of the incoming stream:
32+
33+
| `TDEST` | Operation |
34+
|---------|-----------|
35+
| 0 | fp32 add |
36+
| 1 | fp64 add |
37+
| 2 | int32 add |
38+
| 3 | int64 add |
39+
| 5 | fp32 max |
40+
| 6 | fp64 max |
41+
| 7 | int32 max |
42+
| 8 | int64 max |
43+
44+
The kernel processes words in a `do-while` loop, terminating when `TLAST` is asserted. The loop body is pipelined with `#pragma HLS PIPELINE II=1 style=frp`:
45+
46+
```cpp
47+
void reduce_ops(STREAM<stream_word> &in0, STREAM<stream_word> &in1, STREAM<stream_word> &out) {
48+
#pragma HLS INTERFACE axis register both port=in0
49+
#pragma HLS INTERFACE axis register both port=in1
50+
#pragma HLS INTERFACE axis register both port=out
51+
#pragma HLS INTERFACE ap_ctrl_none port=return
52+
stream_word op0, op1, wword;
53+
ap_uint<DATA_WIDTH> res;
54+
55+
do {
56+
#pragma HLS PIPELINE II=1 style=frp
57+
op0 = STREAM_READ(in0);
58+
op1 = STREAM_READ(in1);
59+
60+
if (op0.dest == 0) res = stream_add<DATA_WIDTH, DEST_WIDTH, float> (op0.data, op1.data);
61+
else if (op0.dest == 1) res = stream_add<DATA_WIDTH, DEST_WIDTH, double> (op0.data, op1.data);
62+
// ... (further cases omitted for brevity)
63+
else res = stream_add<DATA_WIDTH, DEST_WIDTH, float> (op0.data, op1.data);
64+
65+
wword.data = res;
66+
wword.last = op0.last;
67+
wword.keep = op0.keep;
68+
wword.dest = 0;
69+
STREAM_WRITE(out, wword);
70+
71+
} while(op0.last != 1);
72+
}
73+
```
74+
75+
The notable constructs in this kernel are the combination of `ap_ctrl_none`, a `do-while` loop whose exit condition reads a stream-derived signal (`op0.last`), and `style=frp` on the pipeline. The root cause of the regression has not been isolated — it is unclear which of these (or their combination) changed behaviour between tool versions.
76+
77+
In the test setup, `TDEST` is hardwired to `2` (int32 add) in the vFPGA top, and the host sends one 512-bit beat (16 × 32-bit integers) per operand stream.
78+
79+
---
80+
81+
## HLS Source
82+
83+
The full HLS source, testbench, and standalone simulation script are in `source/hw/src/hls/reduce_ops/`:
84+
85+
| File | Description |
86+
|------|-------------|
87+
| `reduce_ops.h` | Types, constants (`DATA_WIDTH=512`, `DEST_WIDTH=8`), stream macros |
88+
| `reduce_ops.cpp` | Full kernel — template helpers `stream_add`/`stream_max` + `reduce_ops` top |
89+
| `reduce_ops_tb.cpp` | HLS C testbench: one beat of 16 int32 values, checks output |
90+
| `run_tb.tcl` | Standalone Vitis HLS script: runs csim → csynth → cosim |
91+
92+
To run the standalone HLS simulation (reproduces the csim/cosim pass):
93+
94+
```bash
95+
cd source/hw/src/hls/reduce_ops
96+
vitis_hls -f run_tb.tcl # Vitis HLS 2022.x – 2024.x
97+
vitis-run --tcl run_tb.tcl --mode hls # Vitis HLS 2025.x+
98+
```
99+
100+
The complete Coyote hardware and software projects are in `source/hw/` and `source/sw/` respectively.
101+
102+
---
103+
104+
## Hardware Observation — ILA Capture
105+
106+
The ILA probes all three AXI-Stream interfaces at the vFPGA boundary (`axis_host_recv[0]`, `axis_host_recv[1]`, `axis_host_send[0]`).
107+
108+
### Vivado 2023.2 — Working
109+
110+
Both operand streams arrive, and the output stream fires correctly on the same transaction:
111+
112+
![ILA waveform — Vivado 2023.2 (working)](images/waveform_23_2.png)
113+
114+
- `axis_host_recv[0].tvalid` and `axis_host_recv[1].tvalid` pulse high as data is transferred.
115+
- `axis_host_send[0].tvalid` asserts in the same window, delivering the result.
116+
- Sample values confirm correct int32 addition: `recv[0]` carries odd integers (1 – 31), `recv[1]` carries even integers (2 – 32), and `send[0]` carries their element-wise sums (3 – 63).
117+
118+
### Vivado 2025.2 — Broken
119+
120+
The operand streams arrive identically, but **the output stream never asserts `TVALID`**:
121+
122+
![ILA waveform — Vivado 2025.2 (broken)](images/waveform_25_2.png)
123+
124+
- `axis_host_recv[0].tvalid` and `axis_host_recv[1].tvalid` pulse high as before — the kernel receives data correctly.
125+
- `axis_host_send[0].tvalid` **remains 0** throughout; `tdata` is all zeros.
126+
- The kernel consumes its inputs and stalls without producing any output.
127+
128+
There are no AXI protocol violations, handshake errors, or DRC failures in either build. The regression is entirely in the kernel's output behaviour.
129+
130+
---
131+
132+
## Reproducing the Bug
133+
134+
### Prerequisites
135+
136+
- Alveo U55C (or U250 / U280)
137+
- Vivado + Vitis HLS (test with ≥ 2024.2 to observe the bug; ≤ 2024.1 for the working reference)
138+
- [Coyote](https://github.com/fpgasystems/Coyote) — with current branch
139+
140+
Pre-built bitstreams, ILA probe files, routed checkpoints, and all build logs for both 2023.2 and 2025.2 are included in this repository so the bug can be observed without re-synthesising.
141+
142+
### Using the pre-built bitstreams
143+
144+
1. **Program the FPGA** using Vivado Hardware Manager, loading the bitstream and probe file from `bitstreams/<version>/`.
145+
2. **Rescan PCIe** or perform a warm reboot to re-enumerate the device.
146+
3. **Insert the Coyote driver**:
147+
```bash
148+
sudo insmod coyote_driver.ko
149+
```
150+
4. **Build and run the software test**:
151+
```bash
152+
cd source/sw && mkdir build && cd build
153+
cmake .. -DFDEV_NAME=u55c
154+
make && sudo ./test
155+
```
156+
On a working build the test prints the expected sums and exits with "Validation passed!". On a broken build the `checkCompleted` poll never returns, as no write completion is signalled.
157+
158+
### Re-synthesising from source
159+
160+
```bash
161+
# Hardware synth
162+
cd source/hw && mkdir build && cd build
163+
cmake .. -DFDEV_NAME=u55c
164+
make project && make bitgen
165+
166+
# Software compilation
167+
cd source/hw && mkdir build && cd build
168+
cmake ..
169+
make
170+
```
171+
Then, follow the steps from above to program the FPGA, insert the driver and run the test.
172+
173+
---
174+
175+
## Folder Structure
176+
177+
Artifacts are split by `<vivado-version>` (`2023.2` or `2025.2`) for direct side-by-side comparison.
178+
179+
```
180+
.
181+
├── source/
182+
│ ├── hw/ # Coyote hardware project
183+
│ │ ├── CMakeLists.txt
184+
│ │ └── src/
185+
│ │ ├── vfpga_top.svh # vFPGA top: instantiates reduce_ops_hls_ip + ILA
186+
│ │ ├── init_ip.tcl # Creates ila_reduce Vivado IP
187+
│ │ └── hls/reduce_ops/
188+
│ │ ├── reduce_ops.h # Types, constants, stream macros
189+
│ │ ├── reduce_ops.cpp # Kernel source (stream_add, stream_max, reduce_ops)
190+
│ │ ├── tb.cpp # C testbench
191+
│ │ └── tb_hls.tcl # Standalone HLS sim script (csim/csynth/cosim)
192+
│ └── sw/ # Coyote software project
193+
│ ├── CMakeLists.txt
194+
│ └── src/main.cpp # Host software: sends 16 int32 operands to the FPGA, waits for completion, checks result
195+
196+
├── images/
197+
│ ├── waveform_23_2.png # ILA capture — Vivado 2023.2 (working)
198+
│ └── waveform_25_2.png # ILA capture — Vivado 2025.2 (broken)
199+
200+
├── bitstreams/
201+
│ └── <vivado-version>/
202+
│ ├── cyt_top_<ver>.bit # Loadable bitstream
203+
│ └── cyt_top_<ver>.ltx # ILA probe file
204+
205+
├── checkpoints/
206+
│ └── <vivado-version>/
207+
│ └── shell_routed_<ver>.dcp # Post-PnR routed checkpoint
208+
209+
├── ila-data/
210+
│ └── <vivado-version>/
211+
│ ├── ila_capture_<ver>.ila # Native Vivado ILA capture
212+
│ └── ila_csv_<ver>.csv # Exported ILA capture (CSV)
213+
214+
├── logs/
215+
│ └── <vivado-version>/
216+
│ ├── vitis_hls_reduce_ops_<ver>.log # Vitis HLS synthesis log
217+
│ ├── vivado_synth_reduce_ops_<ver>.log # Vivado OOC synth log (reduce_ops IP)
218+
│ ├── vivado_synth_top_<ver>.log # Vivado top-level synthesis log (containing the reduce_ops)
219+
│ └── vivado_pnr_<ver>.log # Place-and-route log
220+
221+
└── reports/
222+
└── <vivado-version>/
223+
├── route_status_<ver>.rpt # Post-PnR route status
224+
├── timing_summary_<ver>.rpt # Timing summary
225+
└── drc_bitstream_checks_<ver>.rpt # DRC checks (no critical errors)
226+
```
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:1d2fd617f01a31752662878fa1c6588b34966741fff1adaa3ff346147b311b6c
3+
size 33423030

0 commit comments

Comments
 (0)