Skip to content

Multi-FPGA Integration#23

Draft
bwintermann wants to merge 133 commits into
devfrom
feature/multifpga
Draft

Multi-FPGA Integration#23
bwintermann wants to merge 133 commits into
devfrom
feature/multifpga

Conversation

@bwintermann

@bwintermann bwintermann commented Jan 21, 2025

Copy link
Copy Markdown

This PR merges Multi-FPGA capabilities. The goal is to enable Multi-FPGA without requiring a fixed communication technology or topology. Everything is customizable and optional. The flow should integrate seamlessly into existing FINN+ build flows. As soon as a PartitioningConfiguration is added to the build configuration, Multi-FPGA steps are automatically used.

Functionality:

  • The partitioning transformation that assigns the device id of a given node
  • Loadable partitioning configs (which in turn enable device-specific manual folding)
  • New CreateStreamingDataflowPartition function that considers device ID (+ optionally SLR?)
  • Aurora packing transformation
  • Network metadata management transformation and classes
  • The Vitis packaging transformation for the Multi-FPGA communication kernels
  • Multi-FPGA VitisBuild
  • Internal vs external IODMAs (when building XOs) (configurable)

Tests:

  • Ensure test coverage
  • All tests pass
    • Aurora
      • Core packaging
      • Metadata
      • Partitioner
      • Objective function value regression tests
    • Inseparable Nodes
    • end2end (depends on FIFO sizing)
    • SDP creation
    • Metadata (general)
    • VitisLinkConfig (Some doctests, most full tests implicitly done by v++/Vivado, since they will immediately catch issues in the config when starting synthesis)
    • Doctests
    • Resource estimation summation functions (It would be ideal if Index platform resource data by type #211 was included as well here)
  • Create MultiFPGA SDPs could theoretically fail in cases in which the device changes in branches, since it currently is based on the order of nodes. Check for this (e.g. device changes between nodes requires neither of them to be a fork or join) (Should be fixed implicitly in MF 2.0) (Fixed by new clustering algorithm)
  • Test single FPGA workflow
  • Test separate_iodmas setting for both single and Multi-FPGA

Documentation:

  • Document changes in the Wiki (can be created before the merge)
  • Partitioner docs
  • Source directory README

Integration / Code Organization:

  • Integrate frontend rework
  • Present partitioning results in a rich table
  • Adding the communication kernels as a dependency
  • Reworked Vitis Build Backend
    • Jinja templating of Vitis link configs
    • Parallel synthesis
  • Partitioning Config should have its own verbosity flag, since Multi-FPGA can print out a lot of information
  • Configurable aurora widths (solved by communication_kernel_arguments in the partitioning config)
  • Rework steps-integration
  • Reorganize Multifpga directory? (Separate files per communication kernel?)
  • Move every non-Multi-FPGA utility function to finn/utils/util.py or fpgadataflow.py
  • Organize partitioner classes (sometimes the partitioner contains the mapping or status, sometimes the transformation, etc. - this can be streamlined)
    • Right now, the partitioner itself contains status fields, while the transformation contains the result mapping and partitioner type. This can be adapted in the future, but for now I'd argue that this separation makes sense.
  • Label and move bitstreams and synthesis reports to the output directory
  • Update step_deployment_package
  • Update pinned AuroraFlow commit to latest (with emulation)
    • Fix potential errors caused by the AuroraFlow update
  • Specialize transformation constructors: If possible don't pass the whole DataflowBuildConfig since that obscures the purpose of passing the config
  • Integration with C++ driver (depends on Fixing some Multi-FPGA defaults finn-cpp-driver#25)
  • Consider resources required by the shell
  • Replace every topology based call against direct predecessors and successors
    • Notably creation of Multi-FPGA SDPs
  • Unify all error types, especially in the partitioner

Final re-check before merging:

  • Don't pass the entire dataflowbuildconfig if avoidable (check all classes)
  • Pin correct AuroraFlow commit
  • Clean up Code
  • Clean up logging statements
  • Update CHANGELOG.md (after having merged from dev one last time) (add changelog for the changed finn test command)
  • Update source directory README.md

Future:

  • Constrain partitioner model further for faster solutions (requires regression testing partitioner objective function values)
  • CI based regression tests for Multi-FPGA
  • platforms.py should be updated:
    • String-based lookup, instead of index based, Index platform resource data by type #211
    • QSFP Port count
    • QSFP Port SLR (done)
    • Important resources per device type (LUT, FF, BRAM, DSP...) (-> "considered resources")
  • Test Auroras unidirectional cores (to discard vitis_dummy_kernel) ( Depends on WIP: Enable unidirectional links pc2/AuroraFlow#47) (Maybe future PR)
  • (Floorplanning) SLR assignment transformation that assigns the SLR of a given node based on some constraints
  • Make detection of considered_resources automatic depending on the board (A u280 always has these resource types: ...)
  • Add DDR / HBM to considered_resources (This depends on having both analysis estimates and available resources for the platforms and could be integrated together with Index platform resource data by type #211. For now, this is moved to a future PR.)
  • Mux/Demux
  • Nicer graph manipulation (see Adding graph utility functions #26) (Postponed for now, see ONNX Script)
  • Enable usage of existing Floorplanning transformation instead of CreateSDP
  • Option to choose between our and finn-experimentals partitioner?
  • Automatic calculation of the required number of FPGAs
  • AuroraFlow as a CustomOp
  • Integrate new AuroraFlow sw_emu and hw_emu modes as verification methods
  • Template mux/demux hls/rtl with jinja
  • Currently an SDP graph does not have branches. This may be beneficial to change, but requires updates to several places in the Single- and Multi-FPGA code
  • Replace own networkx conversion with the one from onnx-passes: iksnagreb/onnx-passes@989d670 (as soon as it's merged into dev) (Unlikely, since the mentioned function also considers non-synthesizable IOs of the model as nodes.)

@bwintermann

bwintermann commented Jan 29, 2025

Copy link
Copy Markdown
Author

TODOs moved.

…ct/finn-plus into feature/vitis_build_improved
bwintermann and others added 20 commits February 25, 2025 14:20
…ultiFPGA. Improved typing and class organization. End2End tests. Updated existing tests
…me + Import fixes. Aurora Single Package Doctest. Resource estimation test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants