-
Notifications
You must be signed in to change notification settings - Fork 9
finn.transformation
This page contains the complete API reference for all modules in the finn.transformation package.
- finn.transformation.fpgadataflow.annotate_cycles
- finn.transformation.fpgadataflow.annotate_resources
- finn.transformation.fpgadataflow.attention
- finn.transformation.fpgadataflow.attention_heads
- finn.transformation.fpgadataflow.cleanup
- finn.transformation.fpgadataflow.compile_cppsim
- finn.transformation.fpgadataflow.convert_to_hw_layers
- finn.transformation.fpgadataflow.create_dataflow_partition
- finn.transformation.fpgadataflow.create_stitched_ip
- finn.transformation.fpgadataflow.derive_characteristic
- finn.transformation.fpgadataflow.externalize_params
- finn.transformation.fpgadataflow.floorplan
- finn.transformation.fpgadataflow.hlssynth_ip
- finn.transformation.fpgadataflow.infer_pixel_padding_deconv
- finn.transformation.fpgadataflow.insert_dwc
- finn.transformation.fpgadataflow.insert_fifo
- finn.transformation.fpgadataflow.insert_hook
- finn.transformation.fpgadataflow.insert_iodma
- finn.transformation.fpgadataflow.insert_tlastmarker
- finn.transformation.fpgadataflow.instrumentation
- finn.transformation.fpgadataflow.loop_rolling
- finn.transformation.fpgadataflow.make_driver
- finn.transformation.fpgadataflow.make_zynq_proj
- finn.transformation.fpgadataflow.minimize_accumulator_width
- finn.transformation.fpgadataflow.minimize_weight_bit_width
- finn.transformation.fpgadataflow.prepare_cppsim
- finn.transformation.fpgadataflow.prepare_ip
- finn.transformation.fpgadataflow.prepare_rtlsim
- finn.transformation.fpgadataflow.raise_scalar_to_rank1
- finn.transformation.fpgadataflow.replace_verilog_relpaths
- finn.transformation.fpgadataflow.set_exec_mode
- finn.transformation.fpgadataflow.set_fifo_depths
- finn.transformation.fpgadataflow.set_folding
- finn.transformation.fpgadataflow.set_loop_boundary
- finn.transformation.fpgadataflow.specialize_layers
- finn.transformation.fpgadataflow.synth_ooc
- finn.transformation.fpgadataflow.templates
- finn.transformation.fpgadataflow.transpose_decomposition
- finn.transformation.fpgadataflow.vitis_build
- finn.transformation.fpgadataflow.vivado_power_estimation
- finn.transformation.general
- finn.transformation.move_reshape
- finn.transformation.qonnx.convert_qonnx_to_finn
- finn.transformation.qonnx.fold_quant_weights
- finn.transformation.qonnx.infer_quant_avg_pool_2d
- finn.transformation.qonnx.qonnx_activation_handlers
- finn.transformation.qonnx.quant_act_to_multithreshold
- finn.transformation.squeeze
- finn.transformation.streamline
- finn.transformation.streamline.absorb
- finn.transformation.streamline.collapse_repeated
- finn.transformation.streamline.extract_norm_scale_bias
- finn.transformation.streamline.reorder
- finn.transformation.streamline.round_thresholds
- finn.transformation.streamline.sign_to_thres
- finn.transformation.util
Annotate the estimate of clock cycles per sample taken by each fpgadataflow node as an attribute on the node.
class AnnotateCycles(Transformation)Annotate the estimate of clock cycles per sample taken by each fpgadataflow node as an attribute on the node.
def __init__() -> NoneConstucts the AnnotateCycles transformation.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, Literal[False]]Annotate the estimate of clock cycles per sample taken for each layer.
class AnnotateResources(Transformation)Annotate the amount of FPGA resources taken by each fpgadataflow node as an attribute on the node, depending on the mode parameter:
- 'estimate' -- use the analytical estimation model
- 'hls' -- use results from the HLS synthesis report
- 'synth' -- use post-synthesis (Vivado or Vitis) report
No annotations can be provided unless the relevant transformation for the chosen mode (e.g. HLSSynthIP for hls) was previously run.
Transformations for converting scaled dot-product attention patterns to FINN hardware layers.
This module provides transformations to detect and convert multi-node ONNX patterns representing scaled dot-product attention into single FINN custom hardware operations, enabling efficient FPGA implementation of transformer attention mechanisms.
class InferScaledDotProductAttention(Transformation)Convert the operator pattern corresponding to scaled dot-product attention to the hardware custom operator node.
def apply(model: ModelWrapper)Apply the transform to a whole model graph.
class AbsorbMultiThresholdIntoScaledDotProductAttention(Transformation)Absorb a MultiThreshold into ScaledDotProductAttention if there is not already an activation included.
def apply(model: ModelWrapper)Apply the transform to a whole model graph.
Transformations for multi-head attention patterns in FPGA dataflow.
class InferMultiHeads(Transformation)Infer multi-head attention patterns and convert to custom operators.
Converts Reshape and Transpose patterns to SplitMultiHeads and MergeMultiHeads hardware custom operators.
def apply(model: ModelWrapper)Apply the transformation to infer multi-head patterns in the model graph.
class MoveSplitMultiHeadsPastMultiThreshold(Transformation)Move SplitMultiHeads operation past MultiThreshold operation.
Required as a precondition for unrolling attention heads.
def apply(model: ModelWrapper)Apply the transformation to move SplitMultiHeads past MultiThreshold.
class MoveMergeMultiHeadsPastMultiThreshold(Transformation)Move MergeMultiHeads operation past MultiThreshold operation.
Avoids merging excessively large streams and potentially allows absorbing thresholds into the attention operator.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, bool]Apply the transformation to move MergeMultiHeads past MultiThreshold.
def is_multi_head_attention(node: NodeProto, model: ModelWrapper) -> boolDetect if a node is part of a multi-head attention pattern.
Returns True if the node is a ScaledDotProductAttention with proper SplitMultiHeads and MergeMultiHeads operations.
class UnrollMultiHeadAttention(Transformation)Unroll multiple attention heads for parallel implementation.
Transforms the ONNX graph to implement attention heads in parallel.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, bool]Apply the transformation to unroll multi-head attention.
class CleanUp(Transformation)Remove any generated files for fpgadataflow nodes.
class CompileCppSim(NodeLocalTransformation)For every node: compile C++ code in node attribute "code_gen_dir_cppsim" and save path to executables in node attribute "executable_path". All nodes in the graph must have the fpgadataflow backend attribute.
To use these executables, exec_mode must be set to "cppsim" (using transformation SetExecMode) and the model has to be executed using execute_onnx() from finn.core.onnx_exec
- num_workers (int or None) number of parallel workers, see documentation in NodeLocalTransformation for more details.
Transformations to map ONNX operators to FINN hardware layers.
class InferConvInpGen(Transformation)Convert Im2Col layers to ConvolutionInputGenerator layers.
def __init__()Initialize the transformation.
def apply(model)Apply the transformation to infer ConvolutionInputGenerator layers.
class InferFMPadding(Transformation)Convert Pad layers to FMPadding layers.
def apply(model: ModelWrapper)Apply the transformation to the entire model graph.
class InferThresholdingLayer(Transformation)Convert any MultiThreshold into a standalone thresholding layer.
def __init__()Initialize the transformation.
def apply(model)Apply the transformation to infer standalone thresholding layers.
class InferRequantLayer(Transformation)Convert MultiThreshold or Quant nodes to Requant.
For MultiThreshold nodes where all channels have uniform (equal-step) thresholds, the comparison-based threshold lookup can be replaced with a simpler requantization operation:
output = clip(round(input * scale + bias), min, max)
where: scale = 1.0 / step_size bias = 0.5 - first_threshold / step_size
For Quant nodes with scale=1 and zeropt=0 (after ExtractQuantScaleZeroPt), the operation simplifies to:
output = clip(round(input), min, max)
which is Requant with scale=1 and bias=0.
This transformation is optional and provides an alternative implementation to InferThresholdingLayer. The Requant node can then be specialized to either HLS or RTL backend.
class InferUpsample(Transformation)Convert Upsample and Resize nodes to UpsampleNearestNeighbour nodes.
def apply(model)Apply the transformation to infer UpsampleNearestNeighbour nodes.
class InferAddStreamsLayer(Transformation)DEPRECATED: This transformation is deprecated and now redirects to InferElementwiseBinaryOperation.
AddStreams functionality is now covered by ElementwiseAdd operations (with both inputs as streaming). This wrapper is kept for backward compatibility.
The ElementwiseBinary operations provide the same functionality with additional features like broadcasting support and more operation types.
class InferDuplicateStreamsLayer(Transformation)Insert a DuplicateStreams HW layer for any tensor with fanout >= 2.
def apply(model)Apply the transformation to insert DuplicateStreams HW layers where needed.
class InferChannelwiseLinearLayer(Transformation)DEPRECATED: This transformation is deprecated and now redirects to InferElementwiseBinaryOperation.
ChannelwiseOp functionality is now covered by ElementwiseBinary operations (Add/Mul with const mode). This wrapper is kept for backward compatibility.
The ElementwiseBinary operations provide the same functionality with additional features like broadcasting support and more operation types.
class InferLabelSelectLayer(Transformation)Convert any TopK into a LabelSelect HW layer.
def apply(model)Apply transformation to convert TopK nodes to LabelSelect hardware layers.
This transformation identifies TopK operations and converts them to FINN's custom LabelSelect nodes for hardware acceleration.
class InferGlobalAccPoolLayer(Transformation)Convert any GlobalAveragePool into a GlobalAccPool HW layer and a scalar Mul.
def apply(model)Apply transformation to infer GlobalAccPool hardware layers.
class InferPool(Transformation)If kernel_shape > strides, replace Pool layer with Im2col + pool combination.
When kernel_shape > strides, replaces Pool layer with Im2col followed by pool (with kernel_shape == strides), plus Transpose layers to keep the original data layout.
def apply(model)Apply transformation to convert Pool operations with kernel_shape > strides.
class InferPoolFromReduce(Transformation)Infer pooling hardware from lowered pooling, i.e., Im2Col+Reduce.
def apply(model: ModelWrapper)Apply transformation to convert lowered pooling to hardware.
class InferLookupLayer(Transformation)Convert Gather nodes with constant op0 into Lookup HW layers.
def apply(model)Apply transformation to convert Gather operations to Lookup hardware layers.
This transformation identifies Gather operations with constant first operand and converts them to FINN's custom Lookup nodes for hardware acceleration.
class InferConcatLayer(Transformation)Convert suitable Concat nodes (operating on last/-1 axis) into StreamingConcat HW layers.
def apply(model)Apply transformation to convert Concat operations to StreamingConcat hardware layers.
This transformation identifies Concat operations operating on the last axis and converts them to FINN's custom StreamingConcat nodes.
class InferSplitLayer(Transformation)Convert suitable Split nodes (operating on last/-1 axis) into StreamingSplit HW layers.
def apply(model)Apply transformation to convert Split operations to StreamingSplit hardware layers.
This transformation identifies Split operations operating on the last axis and converts them to FINN's custom StreamingSplit nodes.
class InferStreamingEltwise(Transformation)DEPRECATED: This transformation is deprecated and now redirects to InferElementwiseBinaryOperation.
StreamingEltwise functionality is now covered by ElementwiseSub and ElementwiseAbsDiff operations (with both inputs as streaming). This wrapper is kept for backward compatibility.
The ElementwiseBinary operations provide the same functionality with additional features like broadcasting support and more operation types.
class InferBinaryMatrixVectorActivation(Transformation)Convert XnorPopcountMatMul layers to MatrixVectorActivation layers.
Any immediately following MultiThreshold layers will also be absorbed into the MVTU.
def __init__()Initialize the transformation.
def apply(model)Apply transformation to convert XnorPopcountMatMul to MVAU nodes.
This transformation identifies XnorPopcountMatMul operations and converts them to FINN's custom MVAU (Matrix Vector Activation Unit) nodes, potentially absorbing following MultiThreshold layers.
class InferQuantizedMatrixVectorActivation(Transformation)Convert MatMul layers with quantized inputs and weights to MatrixVectorActivation layers.
def apply(model)Apply transformation to convert MatMul to MVAU nodes.
class InferVectorVectorActivation(Transformation)Convert MatMul layers to VectorVectorActivation layers for depthwise convolutions.
Converts MatMul layers with quantized inputs and weights to VectorVectorActivation layers, if the sparsity annotation of the weight matrix indicates that the MatMul layer belongs to a depthwise convolution. Any immediately following MultiThreshold layers will also be absorbed into the VVAU.
def __init__()Initialize the transformation.
def apply(model)Apply transformation to convert MatMul to VVAU nodes for depthwise convolutions.
class InferHWSoftmax(Transformation)Infers a regular softmax node without merging the multithreshold and setting the softmax to perform the quantisation.
def __init__()Infers a regular softmax node without merging the multithreshold and setting the softmax to perform the quantisation.
def apply(model)Apply the transformation.
def skip_first_node_transpose(model, node)Default filter for InferShuffle: skip Transpose if it's the first node in the graph. This is useful for image classification networks where the first transpose converts NCHW to NHWC layout for data preprocessing.
class InferShuffle(Transformation)Find transpose layers with (optionally) reshape layers around them and convert them into a shuffle operator
def apply(model)Apply the transformation.
def lift_to_rank1(name: str, model: ModelWrapper)Lift scalar to rank-1 tensor.
Converts scalar tensors (shape []) to rank-1 tensors with a single element (shape [1]).
class InferElementwiseBinaryOperation(Transformation)Convert supported elementwise binary operations to their FINN custom operation.
@staticmethod
def reject_output_dequant(model: ModelWrapper, node: NodeProto)Filter function to filter out the last elementwise Mul operation.
Typically filters output de-quantization operations which should happen off-chip.
def __init__(_filter=None)Initialize the transformation method with an optional filter function.
def apply(model: ModelWrapper)Apply the transform to convert elementwise binary operations to FINN custom ops.
class InferReLUAsElementwiseMax(Transformation)Converts ReLU into ElementwiseMaximum(in, 0).
@staticmethod
def reject_unsupported_dtypes(model: ModelWrapper, node: NodeProto)Filter function to filter out any operation involving any floating-point tensor.
def __init__(_filter=reject_unsupported_dtypes)Initializes the transformation method with an optional filter function.
def apply(model: ModelWrapper)Apply the transformation.
class InferLayerNorm(Transformation)Convert LayerNorm into HW, only norming over channel dim. This transform is adapted from Brainsmith InferLayerNorm.
def apply(model)Apply the transformation.
def elements_are_consecutive(indices)Are elements consecutive (max diff. 1 between all adjacent elements)?
class InferCrop(Transformation)Find gather layers that can be converted into a Crop layer and replace them with a Crop layer
def __init__()Find gather layers that can be converted into a Crop layer and replace them with a Crop layer
def apply(model)Apply the transformation.
class InferSqueeze(Transformation)Converts the Squeeze operation to the corresponding FINN custom operation.
def apply(model: ModelWrapper)Apply the transform to convert Squeeze operations to FINN custom ops.
class InferUnsqueeze(Transformation)Convert the Unsqueeze operation to the corresponding FINN custom operation.
def apply(model: ModelWrapper)Apply the transform to convert Unsqueeze operations to FINN custom ops.
class InferReshape(Transformation)Converts ONNX Reshape operator to the corresponding HWCustomOp.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, bool]Apply the transform to convert Reshape operations hardware.
class CreateDataflowPartition(Transformation)Split a graph into two graphs; one which contains non-FINN-dataflow nodes and a StreamingDataflowPartition node, and another which only contains FINN dataflow nodes. The StreamingDataflowPartition has a model attribute that indicates the filename for the second graph that only contains dataflow nodes. No action is taken if there are no dataflow nodes.
Transformation to create stitched IP from dataflow graph components.
def is_external_input(model, node, i)Determine whether input i of node should be made external.
True only if input is unconnected and has no initializer. Only exception is second input of FC layers when mem_mode is external.
def is_external_output(model, node, i)Determine whether output i of node should be made external.
class CreateStitchedIP(Transformation)Create a Vivado IP Block Design project from all the generated IPs of a graph. All nodes in the graph must have the fpgadataflow backend attribute, and the PrepareIP transformation must have been previously run on the graph. The resulting block design is also packaged as IP. The transformation gets the fpgapart as a string.
Outcome if successful: sets the vivado_stitch_proj attribute in the ONNX ModelProto's metadata_props field, with the created project dir as the value. A make_project.tcl script is also placed under the same folder, which is called to instantiate the per-layer IPs and stitch them together. The packaged block design IP can be found under the ip subdirectory.
def __init__(fpgapart,
clk_ns,
ip_name="finn_design",
vitis=False,
signature=[])Initialize CreateStitchedIP transformation with FPGA part and clock settings.
def is_double_pumped(node)Check if node uses double pumped computation.
def connect_clk_rst(node)Connect clock and reset signals for the node.
def connect_axi(node, model)Connect AXI interfaces for the node.
def connect_m_axis_external(node, idx=None)Connect master AXI stream interfaces as external ports.
def connect_s_axis_external(node, idx=None)Connect slave AXI stream interfaces as external ports.
def connect_ap_none_external(node)Connect ap_none interfaces as external ports.
def insert_signature(checksum_count)Insert signature block for design identification.
def apply(model)Apply the CreateStitchedIP transformation to the model.
class DeriveCharacteristic(NodeLocalTransformation)For each node in the graph, run rtlsim to obtain the i/o characteristic function for FIFO sizing and set the attribute. It is assumed that the PrepareRTLSim transformation was already called on the graph.
This transformation performs rtlsim for each node, so it will run for some time (minutes to hours depending on configuration).
-
period (int) desired period over which the characteristic function will be derived.
-
num_workers (int or None) number of parallel workers, see documentation in NodeLocalTransformation for more details.
class DeriveFIFOSizes(NodeLocalTransformation)Prerequisite: DeriveCharacteristic already called on graph. For each node in the graph, use the accumulated I/O characteristic function to perform FIFO sizing, setting the in/outFIFODepths attributes of HLSCustomOp nodes.
- num_workers (int or None) number of parallel workers, see documentation in NodeLocalTransformation for more details.
class ExternalizeParams(Transformation)Create top-level graph inputs for IODMAs serving layers where weights are marked as external using mem_mode="external".
class Floorplan(Transformation)Perform Floorplanning of the dataflow design:
floorplan: path to a JSON containing a dictionary with SLR assignments for each node in the ONNX graph. Must be parse-able by the ApplyConfig transform.
The transform applies the properties in the supplied JSON then: -Separates DMAs into their own partitions IDs, -If not explicitly assigned, assigns DWCs to SLRs to minimize SLLs required -If not explicitly assigned, assigns FIFOs to the SLR of the upstream node
class HLSSynthIP(NodeLocalTransformation)For each HLS node: generate IP block from code in folder that is referenced in node attribute "code_gen_dir_ipgen" and save path of generated project in node attribute "ipgen_path". All nodes in the graph must have the fpgadataflow backend attribute. Any nodes that already have a ipgen_path attribute pointing to a valid path will be skipped.
This transformation calls Vitis HLS for synthesis, so it will run for some time (minutes to hours depending on configuration).
- num_workers (int or None) number of parallel workers, see documentation in NodeLocalTransformation for more details.
class InferPixelPaddingDeconv(Transformation)Lowering and conversion of ConvTranspose (NCHW) nodes to FMPadding_Pixel + Im2Col + MatMul (NHWC) surrounded by Transpose nodes note: this transformation produces a mix of hw layers and non hw layers to implement this on an FPGA the Im2Col and MatMul nodes need to be converted to hw layers after applying this transformation and the resulting transpose nodes need to be streamlined. See deconv test case under tests/fpgadataflow for an example.
class InsertDWC(Transformation)Add data width converters between layers where necessary.
class InsertFIFO(Transformation)Inserting FIFOs in the beginning and end of the graph as well as
between fpgadataflow nodes.
Takes the setting for the depth from the surrounding nodes by extracting node attribute 'outFIFODepths' of the previous and node attribute 'inFIFODepths' of the subsequent node. max() of these two values sets the FIFO depth.
Constructor arguments:
Arguments:
-
max_qsrl_depth: FIFOs deeper than this will use Vivado IP instead of Verilog FIFOs (Q_srl.v) -
vivado_ram_style: the StreamingFIFO.ram_style attribute to be used for large FIFOs implemented by Vivado -
create_shallow_fifos: Normally, shallow-depth (<=2) FIFOs won't be created since HLS streaming interfaces already have a degree of buffering. Override with this parameter.
The other node attributes necessary to create a FIFO node are taken from the node the FIFO node is inserted after: 'folded_shape' and 'dtype'
class InsertHook(Transformation)Inserting hook layer after each layer that has the node attribute 'output_hook' specified
class InsertIODMA(Transformation)Insert DMA nodes on inputs and outputs, or as specified by filters in the constructor.
def get_mem_init(weights, pe, simd)Returns matrix ready for pack_innermost_dim_as_hex_string with reverse=False (finn.util.data_packing) to return the memory init file little endian packed. That is, get_mem_init returns: elem(pe,simd) addr = 0: [(pe-1,simd-1),(pe-1,simd-2),...(0,1),(0,0)] addr = 1: [(pe-1,simd*2-1),.......(0,simd+1),(0,simd)] .
class InsertTLastMarker(Transformation)Ensure that the graph is started/terminated with a TLastMarker_hls node, inserting one if necessary. Use constructor args to determine type of TLastMarker to be inserted. More information available on the TLastMarker documentation.
Transformations for generating and simulating instrumentation IP.
def collect_ip_dirs(model, ipstitch_path)Collect list of all IP directories required by the design.
class GenerateInstrumentationIP(Transformation)Generate instrumentation IP for performance monitoring.
def __init__(fpga_part, clk_period_ns, avg_n=64, format="ip")Initialize instrumentation IP generation with FPGA part and clock settings.
def apply(model)Generate instrumentation IP core.
class PrepareInstrumentationSim(Transformation)Prepare simulation environment for instrumentation.
def __init__(fpga_part)Initialize instrumentation simulation preparation.
def apply(model)Prepare scripts for simulating instrumentation IP.
class RunInstrumentationSim(Transformation)Run instrumentation simulation to collect performance data.
def __init__()Initialize instrumentation simulation runner.
def apply(model)Run instrumentation simulation script.
def get_constant_from_value(value)Get the constant value of a tensor.
def same_values(inputs)Check if all inputs have the same constant value.
class LoopRolling(Transformation)Boilerplate Transformation for loop rolling in fpgadataflow.
Create C++ and PYNQ drivers for FINN-generated accelerators.
def update_bitfile_path_after_copy(bitfile_path: str, json_path: str) -> NoneUpdate the xclbinPath in the JSON configuration to point to the new bitfile location.
Arguments:
-
json_pathstr - Path to the JSON configuration file -
bitfile_pathstr - New path to the bitfile (.xclbin)
class MakeCPPDriver(Transformation)Create CPP code to correctly interface the generated accelerator, including data packing/unpacking. Should be called after conversion to HLS layers, folding and the creation of dataflow partitions for correct operation. platform: has to be "alveo", otherwise an error is thrown Outcome if successful: sets the cpp_driver_dir attribute in the ONNX ModelProto's metadata_props field, with the created driver dir as the value. runtime writeable weights not yet supported.
def resolve_dt_name(s: str) -> strResolve datatype name for C++ driver code generation.
Arguments:
-
s- Datatype string to resolve
Returns:
Resolved C++ datatype name
Raises:
-
FINNInternalError- If datatype is unknown
def __init__(platform: str, version: str, host_mem: str)Initialize MakeCPPDriver transformation.
Arguments:
-
platform- Target platform (must be "alveo") -
version- Version of finn-cpp-driver to use ("latest" or commit hash) -
host_mem- Memory type (FpgaMemoryType.HOST_MEM or FpgaMemoryType.DEVICE_MEM)
Raises:
-
FINNUserError- If platform is not "alveo"
def apply(model: ModelWrapper) -> Tuple[ModelWrapper, bool]Apply the MakeCPPDriver transformation to generate C++ driver code.
Arguments:
-
model- ONNX model wrapper
Returns:
Tuple of (modified model, transformation success flag)
class MakePYNQDriver(Transformation)Create PYNQ Python code to correctly interface the generated accelerator, including data packing/unpacking. Should be called after conversion to HLS layers, folding and the creation of dataflow partitions for correct operation.
platform: one of ["zynq-iodma", "alveo"]
Outcome if successful: sets the pynq_driver_dir attribute in the ONNX ModelProto's metadata_props field, with the created driver dir as the value. If any layers use runtime-writable parameters, those will be gathered under the runtime_weights/ subfolder of the pynq_driver_dir.
def __init__(platform,
driver_type,
clk_period_ns=None,
validation_datset=None,
experiment_info=None,
board=None)Initialize PYNQ driver generation.
Arguments:
-
platform- Target platform, one of ["zynq-iodma", "alveo"]. -
driver_type- Type/name of the driver to generate (e.g. "FINNDMAOverlay", "FINNDMAInstrumentationOverlay"). -
clk_period_ns- Clock period in nanoseconds used for performance calculations. -
validation_datset- Validation dataset path or identifier. -
experiment_info- Path to a JSON file containing experiment metadata.
def apply(model)Apply the MakePYNQDriver transformation.
Creates a PYNQ Python driver package for interfacing with the generated accelerator, including data packing/unpacking and runtime weight handling.
Arguments:
-
model- The ONNX model to generate a driver for.
Returns:
Tuple of (modified model, False) indicating transformation applied.
Transformation to create Zynq Vivado projects for FINN dataflow designs.
def collect_ip_dirs(model, ipstitch_path)Collect list of all IP directories required by the design.
class MakeZYNQProject(Transformation)Create a Vivado overlay project (including the shell infrastructure) from the already-stitched IP block for this graph. All nodes in the graph must have the fpgadataflow backend attribute, and the CreateStitchedIP transformation must have been previously run on the graph. This is functionally equivalent with MakePYNQProject but does not use Pynq infrastructure and instead creates a fully custom block design. However, this transform requires DMAs in the accelerator design.
Outcome if successful: sets the vivado_pynq_proj attribute in the ONNX ModelProto's metadata_props field, with the created project dir as the value.
def __init__(platform,
period_ns,
enable_debug=False,
enable_finn_switch=False,
live_fifo_sizing=False)Initialize MakeZYNQProject with platform settings.
def apply(model)Apply the transformation to create a Zynq project.
class ZynqBuild(Transformation)Best-effort attempt at building the accelerator for Zynq. It assumes the model has only fpgadataflow nodes
def __init__(platform,
period_ns,
enable_debug=False,
enable_instrumentation=False,
instrumentation_no_dma=False,
instrumentation_avg_n=64,
live_fifo_sizing=False,
partition_model_dir=None)Initialize ZynqBuild with platform and build settings.
def apply(model)Apply the ZynqBuild transformation to create a complete Zynq accelerator.
class MinimizeAccumulatorWidth(Transformation)For relevant nodes, call the accumulator width minimization functions to save on resources. May alter tensor DataType for certain nodes if they produce an accumulator as result.
class MinimizeWeightBitWidth(Transformation)For relevant nodes, call the weight bit width minimization functions to save on resources. May alter tensor weightDataType if the node does not have runtime writeable weights.
class PrepareCppSim(Transformation)Call custom implementation to generate code for single custom node and create folder that contains all the generated files. All nodes in the graph must have the fpgadataflow backend attribute.
Outcome if succesful: Node attribute "code_gen_dir_cppsim" contains path to folder that contains generated C++ code that can be used to simulate node using cppsim. The subsequent transformation is CompileCppSim
class PrepareIP(Transformation)Call custom implementation to generate code for single custom node and create folder that contains all the generated files. All nodes in the graph must have the fpgadataflow backend attribute and transformation gets additional arguments:
-
fpgapart (string)
-
clk in ns (int)
Any nodes that already have a code_gen_dir_ipgen attribute pointing to a valid path will be skipped.
Outcome if succesful: Node attribute "code_gen_dir_ipgen" contains path to folder that contains:
-
For HLS layers: generated C++ code that can be used to generate a Vivado IP block. The necessary subsequent transformation is HLSSynthIP.
-
For RTL layers: filled template verilog files that can be used to instantiate as module during IP stitching.
class PrepareRTLSim(NodeLocalTransformation)For a graph with generated RTL sources (after HLSSynthIP), create an emulation library for each node to prepare for rtlsim execution and set the rtlsim_so property to the path to the generated emulation library.
To use these libraries, exec_mode must be set to "rtlsim" (using SetExecMode) and the model has to be executed using execute_onnx() from finn.core.onnx_exec
- num_workers (int or None) number of parallel workers, see documentation in NodeLocalTransformation for more details.
class RaiseScalarToRank1(Transformation)Lift all scalar tensors in the model to rank-1 tensors.
Scalars in ONNX are represented with an empty shape. Downstream FINN
transformations often expect tensors to have at least rank 1. This
transformation scans all tensors that have shape information attached and
ensures scalars are reshaped to have shape [1] while keeping any
initializer data consistent.
class ReplaceVerilogRelPaths(Transformation)Convert ./ relative file paths to absolute ones for generated Verilog
class SetExecMode(Transformation)Set attribute exec_mode in all fpgadataflow nodes to specify which kind of execution should be used ("cppsim" or "rtlsim"). Note that RTL components do not support cppsim. When cppsim is selected for RTL components, by default the execution of the HW op parent is executed.
Transformations for inserting and sizing FIFOs in FINN dataflow graphs.
def reset_implementation(node: "HWCustomOp") -> NoneReset IP generation attributes of a node to trigger re-synthesis.
def set_signal(sim: _SimProtocol, keyw: str, value: int) -> NoneSet the first simulation input signal whose name contains keyw to value.
def get_signal(sim: _SimProtocol, keyw: str) -> int | NoneReturn the value of the first simulation output signal whose name contains keyw.
def optimize_depth(depth: int) -> intRound depth to avoid resource-inefficient FIFO sizes.
class RemoveShallowFIFOs(Transformation)Remove zero-depth FIFOs The threshold used to be 2 instead of 0, but with increasing number of FINN RTL components 2-depth FIFOs are still important for decoupling..
def __init__(shallow_threshold: int = 0) -> NoneInitialize RemoveShallowFIFOs with the given depth threshold.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, Literal[False]]Remove FIFOs at or below the shallow threshold depth.
class CapConvolutionFIFODepths(Transformation)Make the size of FIFOs for convolution layers smaller where possible.
Will be automatically called from InsertAndSetFIFODepths if the appropriate constructor flag is set.
Constructor arguments:
Arguments:
-
max_qsrl_depth: FIFOs deeper than this will use Vivado IP instead of Verilog FIFOs (Q_srl.v)
Assumed input graph properties:
- all nodes are fpgadataflow nodes
- FIFOs inserted with InsertAndSetFIFODepths
Output:
- graph with smaller-depth FIFOs for convolutions
Background: The simulation-based rtlsim_exec tends to overestimate the required depth of FIFOs between the ConvolutionInputGenerator (here called SWG) and the MatrixVectorActivation (here called MVAU). As the SWG has an internal buffer of 1 image row, we use this as a rule of thumb to set FIFO depth to be no larger than 1 row.
def __init__(max_qsrl_depth: int = 256) -> NoneInitialize CapConvolutionFIFODepths with the given maximum SRL FIFO depth.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, Literal[False]]Cap FIFO depths between ConvolutionInputGenerator and MVAU nodes.
def xsi_fifosim(model: ModelWrapper,
n_inferences: int,
max_iters: float | None = None,
throttle_cycles: int = 0) -> dict[str, int]Create a XSI model of stitched IP and use a simple C++ driver to drive the input stream. Useful for FIFO sizing, latency and throughput measurement. If max_iters is None, use the default liveness threshold instead. throttle_cycles can be used for throttling the input stream every time a frame is finished.
class InsertAndSetFIFODepths(Transformation)Insert appropriate-depth StreamingFIFOs through RTLSim that preserve
throughput in the created accelerator.
Constructor arguments:
Arguments:
-
clk_ns: clock period (used for IP preparation) -
max_qsrl_depth: FIFOs deeper than this will use Vivado IP instead of Verilog FIFOs (Q_srl.v) -
max_depth: how deep the "max"-sized FIFOs initially inserted will be. If set to None, use the tensor size as the depth -
swg_exception: call CapConvolutionFIFODepths to make convolution FIFOs smaller where appropriate -
vivado_ram_style: the StreamingFIFO.ram_style attribute to be used for large FIFOs implemented by Vivado afterwards -
fifosim_input_throttle: use input throttling based on dataflow analysis while doing simulation-based FIFO sizing
Assumed input graph properties:
- all nodes are fpgadataflow nodes
- no FIFOs inserted,
- (inFIFODepths/outFIFODepths attrs will be ignored)
Output:
- graph with appropriate-depth FIFOs inserted
Background: Even with all FINN HLS fpgadatflow layers appropriately parallelized, it is necessary to insert FIFOs between them to prevent stalls due to bursty behavior. The sizes of those FIFOs are hard to predict analytically, so we do the following:
- insert deep (=tensor size) FIFOs between all fpgadataflow nodes
- create stitched design
- run through rtlsim with stream of multiple random input images (to fill pipeline)
- keep track of observed maximum occupancy for each FIFO during rtlsim
- when sim finished, update each FIFO depth to maximum observed occupancy and set inFIFODepths/outFIFODepths attrs to that depth as well
def __init__(fpgapart: str,
clk_ns: float = 10.0,
max_qsrl_depth: int = 256,
max_depth: int | None = None,
swg_exception: bool = False,
vivado_ram_style: str = "auto",
fifosim_input_throttle: bool = True,
cfg_n_inferences: int = 2) -> NoneInitialize InsertAndSetFIFODepths with synthesis and simulation parameters.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, Literal[False]]Insert and size StreamingFIFOs using RTL simulation.
def get_fifo_split_configs(
depth: int,
max_qsrl_depth: int = 256,
max_vivado_depth: int = 32768) -> list[tuple[int, str]]Break non-power-of-2 sized FIFO depths into several ones.
class SplitLargeFIFOs(Transformation)Split large FIFOs before implementation, for two reasons.
- impl_style="vivado" supports a max depth of 32k. Any larger FIFOs must be implemented as a sequence of smaller FIFOs.
- impl_style="vivado" requires power-of-two depths, which is normally handled by rounding up to the nearest power-of-two. So a FIFO of size 8196 normally gets rounded-up to a depth of 16384 and wastes a lot of resources. Here, instead, we split this up into two FIFOs of depth 8192 + 4.
def __init__(max_qsrl_depth: int = 256, max_vivado_depth: int = 32768) -> NoneInitialize SplitLargeFIFOs with maximum FIFO depth constraints.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, Literal[False]]Split large FIFOs into chains of smaller power-of-two FIFOs.
Automatically sets folding, i.e., parallelism attributes for all FINN operators.
def divisors(num: int) -> Generator[int, Any, None]Yield divisors of num.
def common_divisors(numbers: list[int]) -> np.ndarrayReturn common divisors of the list of numbers.
class SetFolding(Transformation)Attempt to set parallelism attributes in all nodes to meet a specific target expressed as cycles per frame target_cycles_per_frame. For each HLSCustomOp node type, the attribute may vary but is typically one of {PE, SIMD}, and has a certain allowed-maximum value and divisibility constraints, which SetFolding will take into account. Note that the algorithm implemented by SetFolding is very simple and it is often possible to hand-tune the returned parallelism configuration for better results.
In the returned model, each node's cycles_estimate attribute will be set to its estimated number of cycles.
If two_pass_relaxation is enabled, SetFolding will internally run a second time if the target cycles from the first pass could not be achieved, instead using the achievable target (which may be constrained by a single node) to obtain a balanced pipeline.
Notable exceptions and special behavior:
When folding dense convolution/FC compute engines ("MVAU"/MatrixVectorActivation), which have two attributes (PE and SIMD):
- first increases SIMD while weight stream width per PE is <= mvau_wwidth_max (configurable in the SetFolding initializer, defaults to 36)
- then increases PE until the target is met or max PE reached
When folding depthwise convolutions ("VVAU"/VectorVectorActivation) or spatial reduction ops (Pool_Batch):
- the producer of the node is expected to be a ConvolutionInputGenerator with depthwise=1, whose SIMD value will be set equal to the PE value of its consumer node
- the VVAU also supports SIMD ("input window") parallelism next to PE ("channels"), but current ConvInpGen limitations require PE to be fully unfolded before SIMD is increased
def __init__(target_cycles_per_frame: int = 1000,
mvau_wwidth_max: int = 36,
two_pass_relaxation: bool = True) -> NoneInitialize the folding target and constraints.
def optimize_attribute_val(node_inst: HWCustomOp, max_val: int,
attr_name: str) -> NoneOptimize the folding attribute until the target cycles are met.
def apply(model: "ModelWrapper") -> tuple[ModelWrapper, Literal[False]]Apply SetFolding to all supported nodes in the model.
class SetLoopBoundary(Transformation)Sets metadata attributes to nodes between defined node or tensor ranges in an ONNX model.
Arguments:
-
node_metadata: Dictionary containing metadata attributes to set on the nodes. -
node_range: Tuple containing start and end node names (start_node, end_node). -
tensor_range: Tuple containing start and end tensor names (start_tensor, end_tensor).
Transformations for specializing FINN layers to HLS or RTL implementations.
This module provides functionality to automatically select and specialize FINN dataflow layers to their optimal hardware implementation variants (HLS or RTL) based on FPGA target, layer constraints, and user preferences.
class SpecializeLayers(Transformation)Specialize all layers to either HLS or RTL variants
def __init__(fpgapart)Initialize the SpecializeLayers transformation.
Arguments:
-
fpgapart- Target FPGA part string for implementation selection
def apply(model)Apply layer specialization transformation to model.
Converts all dataflow layers to their optimal HLS or RTL implementation variants based on target FPGA and layer constraints.
Transformation for out-of-context Vivado synthesis on stitched IP designs.
def is_hls_float_op(node: NodeProto, model: ModelWrapper) -> boolCheck if a node is an HLS operator with floating-point inputs.
class SynthOutOfContext(Transformation)Run out-of-context Vivado synthesis on a stitched IP design.
def __init__(part: str,
clk_period_ns: float,
clk_name: str = "ap_clk") -> NoneInitialize the SynthOutOfContext transformation.
Arguments:
-
part- Target FPGA part for synthesis -
clk_period_ns- Clock period in nanoseconds -
clk_name- Clock signal name (default: "ap_clk")
def apply(model: ModelWrapper) -> tuple[ModelWrapper, bool]Apply out-of-context synthesis transformation to the model.
Template strings for FPGA dataflow build scripts.
def shuffle_perfect_loopnest_coeffs(shape: tuple[int],
perm: tuple[int]) -> tuple[int]Given an input shape and permutation matrix calculate the coefficients for the perfect loop nest for HLS generation.
def apply_inner_shuffle_operation(perm: List[int],
shape: List[int] = None,
simd: int = 1) -> List[int]Apply inner_shuffle operation: swap the last two positions (..., a, b) -> (..., b, a)
def apply_outer_shuffle_operation(perm: List[int],
i: int,
j: int,
shape: List[int] = None,
simd: int = 1) -> Optional[List[int]]Apply outer_shuffle operation: swap positions i and j Constraint: cannot move the very last dimension
def get_all_possible_moves(
perm: List[int],
shape: List[int] = None,
simd: int = 1
) -> List[Tuple[List[int], str, Optional[Tuple[int, int]]]]Get all possible moves from current permutation. Returns list of (new_permutation, operation_type, operation_params) tuples.
Each outer_shuffle move represents a single pairwise swap that doesn't involve the last dimension. Complex permutations are built by chaining multiple such operations.
operation_type is either 'inner_shuffle' or 'outer_shuffle' operation_params is None for inner_shuffle, (i, j) for outer_shuffle
def is_valid_hardware_permutation(perm_array: List[int]) -> boolCheck if a permutation array represents a valid hardware operation. Valid operations are:
- inner_shuffle: swap last two elements
- outer_shuffle: any permutation that doesn't move the last element
def find_minimal_operation_sequence(
start_perm: List[int],
target_perm: List[int],
shape: List[int] = None,
simd: int = 1
) -> Optional[List[Tuple[str, Optional[Tuple[int, int]]]]]Find minimal sequence of operations to transform start_perm into target_perm. Uses BFS to find shortest path, ensuring all intermediate permutations are hardware-valid. Returns list of (operation_type, operation_params) tuples.
TODO: We want this to be cost based and include a buffer size cost model.
def convert_operations_to_permutations(start_perm: List[int],
operations: List[Tuple[
str, Optional[Tuple[int, int]]]],
shape: List[int] = None,
simd: int = 1) -> List[List[int]]Convert a sequence of operations to a list of permutation arrays. Each permutation represents the transformation for that step.
def can_be_single_operation(
target_perm: List[int],
shape: List[int] = None,
simd: int = 1) -> Optional[Tuple[str, Optional[Tuple[int, int]]]]Check if the target permutation can be achieved with a single operation. i.e. no decomposition is required. Returns (operation_type, operation_params) or None if not possible.
def decompose_transpose_with_constraints(
target_perm: List[int],
shape: List[int] = None,
simd: int = 1) -> Tuple[List[List[int]], List[str]]Decompose a target permutation into a sequence of hardware-constrained operations.
inner_shuffle: swaps the last two dimensions outer_shuffle: can implement any permutation that doesn't move the last dimension (may require multiple steps)
Returns (permutations, operation_types).
- permutations: list of permutation arrays for each step
- operation_types: list of operation types ('inner_shuffle' or 'outer_shuffle') for each step
class ShuffleDecomposition(Transformation)Transformation that decomposes Shuffle nodes into a chain of Shuffle ops that can map to InnerShuffle and OuterShuffle nodes.
class InferInnerOuterShuffles(Transformation)Infers Inner and Outer Shuffles from Shuffle operators. This should run after the ShuffleDecomposition transformation.
Transformation to build FINN dataflow designs for Alveo using Vitis.
class CreateVitisXO(Transformation)Create a Vitis object file from a stitched FINN ip.
Outcome if successful: sets the vitis_xo attribute in the ONNX ModelProto's metadata_props field with the name of the object file as value. The object file can be found under the ip subdirectory.
def __init__(ip_name="finn_design")Initialize CreateVitisXO transformation.
def apply(model)Apply CreateVitisXO transformation to create Vitis object file.
class VitisLink(Transformation)Create an XCLBIN with Vitis.
Outcome if successful: sets the bitfile attribute in the ONNX ModelProto's metadata_props field with the XCLBIN full path as value.
def __init__(platform,
f_mhz=200,
strategy=VitisOptStrategy.PERFORMANCE,
enable_debug=False,
fpga_memory_type="default")Initialize VitisLink transformation with platform and build settings.
def apply(model)Apply VitisLink transformation to create XCLBIN.
class VitisBuild(Transformation)Best-effort attempt at building the accelerator with Vitis.
It assumes the model has only fpgadataflow nodes
Arguments:
-
fpga_part: string identifying the target FPGA -
period_ns: target clock period -
platform: target Alveo platform, one of ["U50", "U200", "U250", "U280"] -
strategy: Vitis optimization strategy -
enable_debug: add Chipscope to all AXI interfaces -
floorplan_file: path to a JSON containing a dictionary with SLR assignments for each node in the ONNX graph. Must be parse-able by the ApplyConfig transform. -
enable_link: enable linking kernels (.xo files), otherwise just synthesize them independently. -
fpga_memory_type: Specify whether Host or FPGA memory such as DDR/HBM should be used
def __init__(fpga_part,
period_ns,
platform,
strategy=VitisOptStrategy.PERFORMANCE,
enable_debug=False,
floorplan_file=None,
enable_link=True,
partition_model_dir=None,
fpga_memory_type=FpgaMemoryType.DEFAULT)Initialize VitisBuild transformation with FPGA and build settings.
def apply(model)Apply VitisBuild transformation to create complete Vitis accelerator.
class VivadoPowerEstimation(Transformation)Run Vivado power estimation on the stitched IP after OOC synthesis. simulate_switching_activity: False = use a fixed set of toggle rates and static probabilities. True = additionally simulate the switching activity of the design for power estimation.
Generally applicable transformations.
class ApplyConfig(Transformation)Applies node properties (attributes) from either a config dict or its JSON representation given as a filename. The JSON file can specify default values for particular op_types, as well as values for nodes with particular names. Example dict::
{
# set kernel_size = 3 for all nodes with op_type=Im2Col
"Defaults" : {"kernel_size" : [3, ["Im2Col"]]},
# set kernel_size = 7 for the particular node with name Im2Col_0
"Im2Col_0" : {"kernel_size" : 7}
}
def __init__(
config: Path | str | dict,
node_filter: Callable[[NodeProto], bool] = lambda _: True) -> NoneApply a JSON config file to the model.
def configure_network(target: GraphProto | ModelWrapper, model_config: dict,
subgraph_hier: str | None) -> NoneConfigure network - target can be a GraphProto or ModelWrapper. If it's a ModelWrapper, get the graph.
def apply(model: ModelWrapper) -> tuple[ModelWrapper, bool]Apply the config to the model.
class RemoveCNVtoFCFlatten(Transformation)Removes a flatten node if it is between two fpgadataflow nodes. For an NHWC-Conv to FC transition, the preceding transpose is absorbed. The flatten operation can also be implemented by a reshape node.
class ConvertQONNXtoFINN(Transformation)Converts QONNX dialect to FINN ONNX dialect.
First the weights are converted using the FoldQuantWeights transformation, then the ConvertQuantActToMultiThreshold transformation is used to convert the activations. If incompatibilities are found a ValueError or RuntimeError is raised.
The optional keyword argument filter_function
presents a way to control which Quant and BipolarQuant nodes in the activation path
are converted to MultiThreshold nodes. A warning will be emitted when a Quant node
is not converted to a MultiThreshold node.
Arguments:
-
filter_function: Each candidate Quant and BinaryQant node is first evaluated by this function. If the function returns False, then the node is not converted to a MultiTrheshold node. The function is given the model and candidate node as parameters. Per default a filter function is inserted, which disables the conversion of Quant nodes, which have a bit width of larger than 8. Defaults to: default_filter_function_generator(max_multithreshold_bit_width=8)
class FoldQuantWeights(Transformation)Merges Quant nodes, which are used as weights into the initializer of the weight tensor.
class AvgPoolAndTruncToQuantAvgPool(Transformation)Convert a section of nodes of the pattern: AveragePool -> Mul (scalar) -> Trunc To the FINN op: QuantAvgPool2d
class AvgPoolAndTruncv1ToQuantAvgPool(Transformation)Convert a section of nodes of the pattern: AveragePool -> Mul (scalar) -> Trunc (v1) To the FINN op: Div -> QuantAvgPool2d -> Mul
class AvgPoolAndTruncv2ToQuantAvgPool(Transformation)Convert a section of nodes of the pattern: AveragePool -> Trunc (v2) To the FINN op: Div -> QuantAvgPool2d -> Mul
class QuantActBaseHandler(ABC)Base class for converting quantized activation expressed in the QONNX dialect
to the FINN ONNX dialect.
Arguments:
-
model(class:qonnx.core.modelwrapper.ModelWrapper``): The model on which this handler should operate. -
quant_node: The Quant node which a given handler should replace. -
quant_node_index(int): The index of the Quant node in the given model.
def __init__(model: ModelWrapper, quant_node, quant_node_index: int)Base class constructor
@classmethod
def valid_predecessor_op_types()Defines which op types the preceding node is allowed to have for this type of activation.
def calculate_node_parameters()Calculate all parameters required for replacing the QONNX style activation with a FINN style one.
def replace_quant_node()Replace the given QONNX style activation with a FINN style one.
class QuantReluHandler(QuantActBaseHandler)Class for converting a quantized relu operation expressed in the QONNX dialect to the FINN ONNX dialect.
class QuantIdentityHandler(QuantActBaseHandler)Class for converting a quantized identity operation expressed in the QONNX dialect to the FINN ONNX dialect. This handler also takes care of quantized HardTanh activations, because these are equivalent to quantized identity activations.
def default_filter_function_generator(max_multithreshold_bit_width=8)This function generates the default filter function for the ConvertQuantActToMultiThreshold transformation. Per default the returned function disables the conversion of Quant nodes which have a bit width above 8 bit.
This function generator can be used as a template to write custom filter functions.
class ConvertQuantActToMultiThreshold(Transformation)Converts Quant nodes in the activation path to MultiThreshold nodes.
The optional keyword argument filter_function
presents a way to control which Quant and BipolarQuant nodes in the activation path
are converted to MultiThreshold nodes. A warning will be emitted when a Quant node
is not converted to a MultiThreshold node.
Arguments:
-
filter_function: Each candidate Quant and BinaryQant node is first evaluated by this function. If the function returns False, then the node is not converted to a MultiTrheshold node. The function is given the model and candidate node as parameters. Per default a filter function is inserted, which disables the conversion of Quant nodes, which have a bit width of larger than 8. Defaults to: default_filter_function_generator(max_multithreshold_bit_width=8)
Transformation to squeeze tensors by removing dimensions of size 1.
class Squeeze(Transformation)Squeezes, i.e., removes, dimensions of size 1 Note: Use this transformation with great care, it currently serves only the purpose of turning the not well-supported 3d data layouts encountered in transformer models with batch dimension of size 1 into 2d data layouts where the sequence dimension is treated as a batch dimension. Everything else is not tested, it might break the model or simply lack support for certain node op-types.
def apply(model: ModelWrapper)Apply squeeze transformation to remove size-1 dimensions.
Collection of default streamlining transformations.
class Streamline(Transformation)Apply the streamlining transform, see arXiv:1709.04060.
def apply(model)Collects and applies the default list of streamlining transformations.
class AbsorbSignBiasIntoMultiThreshold(Transformation)Absorb scalar bias originating from signed int export back into MultiThreshold and re-evaluate the output datatype.
class AbsorbAddIntoMultiThreshold(Transformation)Absorb preceding Add ops into MultiThreshold by updating the threshold values. Only scalar/1D add vectors can be absorbed.
class AbsorbMulIntoMultiThreshold(Transformation)Absorb preceding Mul ops into MultiThreshold by updating the threshold values. Only positive scalar/1D mul vectors can be absorbed.
class FactorOutMulSignMagnitude(Transformation)Split multiply-by-constant nodes into two multiply-by-constant nodes, where the first node is a bipolar vector (of signs) and the second is a vector of magnitudes.
class Absorb1BitMulIntoMatMul(Transformation)Absorb bipolar or binary multiplications into the preceding matrix multiply.
class Absorb1BitMulIntoConv(Transformation)Absorb bipolar or binary multiplications into the preceding convolution.
class AbsorbTransposeIntoMultiThreshold(Transformation)For (NCHWTranspose -> MultiThreshold) move Transpose past MultiThreshold and set its data_layout mode to NHWC.
class AbsorbTransposeIntoFlatten(Transformation)Absorb transpose node into succeeding flatten node, if H=W=1 and the first dimension stays the same. Can also be applied if flatten is implemented implicitly by a reshape node with shape [1, -1] and the first input dimension is 1
class AbsorbScalarMulAddIntoTopK(Transformation)Remove mul/add node prior to topk node if the op is scalar. Note that the TopK output probabilities will change, but the indices won't.
class AbsorbConsecutiveTransposes(Transformation)Remove (Transpose -> Transpose) patterns when the input and output of the pattern have the same layout.
class AbsorbTransposeIntoResize(Transformation)For (NCHWTranspose -> Resize) move Transpose past Resize and change the Resize node's attributes accordingly.
class CollapseRepeatedOp(Transformation)Collapse repeated consecutive operations with constant parameters into a single operation. make_collapsed_param_fxn must take two tensors and return a tensor which gives the equivalent result using a single op.
class CollapseRepeatedAdd(CollapseRepeatedOp)Collapse repeated adder node into a single operation.
class CollapseRepeatedMul(CollapseRepeatedOp)Collapse repeated multiplier node into a single operation.
class ExtractNormScaleBias(Transformation)Extract LayerNormalization scale and bias into separate nodes and set initializers to 1 or 0 respectively.
class MoveAddPastMul(Transformation)Move add operations past multiply operations on linear segments of the graph. The aim is to have them next to each other such that they can be collapsed into a single add.
class MoveScalarMulPastMatMul(Transformation)Move scalar mul operations past matmul operations. We want to have muls next to each other such that they can be collapsed into a single mul.
class MoveScalarAddPastMatMul(Transformation)Move scalar add operations past matmul operations. We want to have adds next to each other such that they can be collapsed into a single add.
class MoveAddPastConv(Transformation)Move scalar and channelwise add operations past conv operations. We want to have adds next to each other such that they can be collapsed into a single add.
class MoveScalarMulPastConv(Transformation)Move scalar mul operations past conv operations. We want to have muls next to each other such that they can be collapsed into a single mul.
class MoveScalarMulPastConvTranspose(Transformation)Move scalar mul operations past ConvTranspose operations. We want to have muls next to each other such that they can be collapsed into a single mul.
class MoveMulPastDWConv(Transformation)Move channelwise mul operations past depthwise conv operations. We want to have muls next to each other such that they can be collapsed into a single mul.
class MoveMulPastMaxPool(Transformation)Move non-negative scalar or channelwise mul operations past max pool operations. We want to have muls next to each other such that they can be collapsed into a single mul.
class MoveLinearPastEltwiseAdd(Transformation)Move linear operations (mul, add) past elementwise add operations where possible. Specifically,matches and transforms the following patterns: (xC) + (yC) -> (x + y) * C (x+A) + (y+B) -> (x + y) + (A + B) where x and y are dynamic inputs, A, B, C are constant tensors (in general).
class MoveScalarLinearPastInvariants(Transformation)Move scalar linear operations (mul, add) past functions which are invariant to them. Specifically, matches and transforms the following patterns: f(x*C) -> f(x) * C f(x+C) -> f(x) + C where x is a dynamic input, C is a constant tensor. Known f which obey this property are: Reshape, Flatten, Transpose, GlobalAveragePool
class MakeMaxPoolNHWC(Transformation)Convert (MaxPool, NHWCTranspose) into (NHWCTranspose, MaxPoolNHWC) and (NCHWTranspose, MaxPool) into (MaxPoolNHWC, NCHWTranspose).
class MakeScaleResizeNHWC(Transformation)Converts the inputs and outputs for all scales Resize and Upsample nodes from NCHW to NHWC.
class MoveOpPastFork(Transformation)Move node operations past graph forks. Used when a node before a fork can be merged with nodes in the branches
class MoveScalarLinearPastSplit(Transformation)Move scalar Mul and Add nodes past channel split operation.
class MoveMaxPoolPastMultiThreshold(Transformation)Move MaxPool nodes past MultiThreshold nodes on linear segments of the graph.
class MoveFlattenPastTopK(Transformation)Move flatten node past a succeeding topk node, if the "axis" attribute in topk is set to -1 and the data layout before the flatten is NHWC with H=W=1
class MoveFlattenPastAffine(Transformation)Moves a node that implements a (1, -1) reshape past a MatMul, Mul or Add node.
class MoveTransposePastScalarMul(Transformation)Moves a Transpose node past a scalar Mul node
class MoveIdenticalOpPastJoinOp(Transformation)Move multiple identical operations on different branches past the common join node. It assumes the shape to be preserved by the join op in the default move_node() method
def move_node(model, n, producers)Should be overwritten for some operations
Returns:
-
bool- whether moving the node was successful
def are_producers_identical(model, producers)Checks only op_types Should be overwritten for additional checks
class MoveAddPastJoinAdd(MoveIdenticalOpPastJoinOp)def move_node(model, n, producers)We use the base move_node method to move the first producer past the join node (and delete the rest)
class MoveAffinePastJoinConcat(MoveIdenticalOpPastJoinOp)Applies to scalar linear or channelwise affine ops with the same parameter value
Rounding and clipping of thresholds to integer representations.
class RoundAndClipThresholds(Transformation)For MultiThreshold, Thresholding, MVAU, and VVAU nodes operating on integer inp/accumulators, round up (ceil) threshold values to the nearest integer and clip to valid range. Type-casts thresholds (back) to the float32 container type (this is separate from the quantization annotation). Runs InferDataTypes() afterward to propagate any changes to the quantization data types.
def apply(model: ModelWrapper)Apply the rounding and clipping to all thresholds in the model.
class ConvertSignToThres(Transformation)Convert Sign node instances to MultiThreshold with threshold at 0.
Utility functions for graph transformations and node type checking.
def is_threshold(node: NodeProto)Check if node is a MultiThreshold operator.
def is_attention(node: NodeProto)Check if node is an Attention operator.
def is_join_matmul(node: NodeProto, model: ModelWrapper)Check if node is a join (two input) matrix multiplication.
def is_matmul(node: NodeProto)Check if node is a MatMul operator.
def is_softmax(node: NodeProto)Check if node is a Softmax operator.
def is_mul(node: NodeProto)Check if node is an element-wise Mul.
def is_add(node: NodeProto)Check if node is an element-wise Add.
def is_end(node: NodeProto, model: ModelWrapper)Check if node is an end node.
def is_scalar(tensor)Check whether tensor is a scalar, i.e., whether all dimensions are 1.
def all_upstream_to_matmul(node: NodeProto, model: ModelWrapper)Get all upstream nodes to matrix multiplication.
def op_types(nodes: list[NodeProto]) -> list[str]Projects a list of ONNX graph nodes to the string representation of the operator types
def is_reshape(node: NodeProto)Check if node is a reshape operator.
def is_transpose(node: NodeProto)Check if node is a transpose operator.
def is_reshape_transpose(node: NodeProto, model: ModelWrapper)Check if node is reshape followed by transpose.
def is_transpose_reshape(node: NodeProto, model: ModelWrapper)Check if node is transpose followed by reshape.
def group_inputs_by_category(node: NodeProto, model: ModelWrapper)Group inputs by categories, i.e., groups dynamic inputs first, followed by initializers. Keep order of inputs in each category.
π Navigation: β Back to API Documentation
This page was generated automatically from source code documentation.
π Home
- Migration Guide
- Building an Accelerator
- DataflowBuildConfig Documentation
- Example Models
- Build Guides:
- Brevitas - Quantization library
- FINN+ Repository
- Custom Steps Library