Skip to content

Route generated-script tool calls through a resolver [CI 2/9]#1600

Merged
auphelia merged 1 commit into
Xilinx:devfrom
merkelmarrow:2-xilinx-tool-dispatch-pr
Jun 10, 2026
Merged

Route generated-script tool calls through a resolver [CI 2/9]#1600
auphelia merged 1 commit into
Xilinx:devfrom
merkelmarrow:2-xilinx-tool-dispatch-pr

Conversation

@merkelmarrow

Copy link
Copy Markdown
Contributor

This is PR 2 of 9 of a series intended to make CI faster and more robust.

This patch is the first building block of a new testing infrastructure model. Most of FINN's tests complete relatively quickly on a single machine. The slow parts are almost exclusively the calls to xilinx tools such as Vivado, v++, and vitis-run/hls. This PR introduces a tool resolver hook in basic.py that allows a full testing suite to be centred on a single machine, while intercepting heavy tool calls and delegating to a compute cluster through user-written shims.

I chose this approach over a pure PATH shim entirely on the deploy-side because in practice it introduced obscure problems and unintended shell/subprocess interactions and is less user friendly for those wanting to set up their own testing. This model is explicit for the end user who wants to run their own infrastructure:

  1. Write shim scripts that delegate the calls for the explicitly specified tools (vivado, v++, vitis_hls, vitis-run) to your compute cluster (e.g. IBM LSF).
  2. Arrange them flat in a directory and point FINN_TOOL_DIR_OVERRIDE to the directory.
  3. FINN should now delegate heavy processes.

Behaviour is mostly identical to before if FINN_TOOL_DIR_OVERRIDE env var isn't set aside from clear fail-fast error if tool is missing.

Xelab note

Xelab is added to the resolver in a separate PR as it requires special handling.

Maintenance note

New heavy tool calls should be added to and called through the resolver. Every time a tool is added (which should be quite rare), it will fail CI tests that use the tool until a corresponding shim is written on the deployment side (assuming FINN_TOOL_DIR_OVERRIDE is set on that deployment). This is the consequence of an explicit model and preferable to a silent fallback that would eventually cause OOMs or load saturation on the single test machine. The tool calls now provide a clear fail-fast error if they are missing.

FINN's transformations write bash scripts that invoke vivado, v++,
vitis_hls, and vitis-run by bare name. Route these calls through a
resolver so a site that wraps the tools (for example, a HPC cluster
or LSF bsub dispatcher that feeds a farm) can intercept them through
a single override.

Add resolve_xilinx_tool(tool_name): with FINN_TOOL_DIR_OVERRIDE set
it resolves to <override>/<tool_name>, otherwise the bare name, and
raises FileNotFoundError if the result is not found. Route the
vivado, v++, vitis_hls, and vitis-run call sites in CreateStitchedIP,
MakeZynqProject, the Alveo build and the FINN loop RTL backend
through it. The wrapping site then needs only one shim directory
whose filenames mathc the tool names.

With the override unset the embedded command is the bare tool name,
same as before. The four vivado and v++ sites additionally gain a
fail-fast check they previously didn't have, so a missing tool
surfaces at transform time with a clear message instead of as a
later bash failure.

Signed-off-by: Marco Blackwell <mblackwe@amd.com>
@merkelmarrow merkelmarrow force-pushed the 2-xilinx-tool-dispatch-pr branch from 879aa5f to 7283e4c Compare June 8, 2026 10:44
@merkelmarrow

Copy link
Copy Markdown
Contributor Author

Changed to stop using tmp_path

@auphelia

Copy link
Copy Markdown
Collaborator

Looks good, thank you @merkelmarrow !

@auphelia auphelia merged commit cc2261b into Xilinx:dev Jun 10, 2026
3 checks passed
merkelmarrow added a commit to merkelmarrow/finn that referenced this pull request Jun 12, 2026
The XSI rtlsim compile step floats SystemVerilog packages ahead of the
modules that import them, because xelab aborts with VRFC 10-2989
("<pkg> is not declared") and writes no xsimk.so when a package is
analysed too late. It did this with a hardcoded list of package names
that has had to grow reactively: ca70a78 ("[finn_xsi] Sort list of
hdl sources to make sure configs come first") floated swg_pkg, then
240dd53 ("[finnxsi] Add mvu_pkg to be loaded first") added mvu_pkg
when a second finn-rtllib package hit the same thing. Each new package
means another red CI run and another manual edit.

Replace the name list with a stable partition that floats any
*_pkg.{sv,v} to the front by basename and otherwise preserves the
caller's order, so new finn-rtllib packages need no further changes.
The helper lives in a small srcutil module with no xsi C-estension
import so it can be unit tested without a built finn_xsi.

Route the xelab call through resolve_xilinx_tool (the resolver added
in Xilinx#1600) so it can be delegated like the other heavy tools.

Add a check flag to launch_process_helper so a non-zero xelab exit
raises immediately. Previously, a failed xelab returned quietly and
surfaced one line later as a FileNotFoundError on the missing xsimk.so,
hiding the real elaboration error. Also decode with errors="replace"
so noisy tool output cannot abort the run with a decode error.

Signed-off-by: Marco Blackwell <mblackwe@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants