-
Notifications
You must be signed in to change notification settings - Fork 9
finn.custom_op
This page contains the complete API reference for all modules in the finn.custom_op package.
- finn.custom_op.fpgadataflow.attention_heads
- finn.custom_op.fpgadataflow.concat
- finn.custom_op.fpgadataflow.convolutioninputgenerator
- finn.custom_op.fpgadataflow.crop
- finn.custom_op.fpgadataflow.duplicatestreams
- finn.custom_op.fpgadataflow.elementwise_binary
- finn.custom_op.fpgadataflow.fmpadding
- finn.custom_op.fpgadataflow.fmpadding_pixel
- finn.custom_op.fpgadataflow.globalaccpool
- finn.custom_op.fpgadataflow.hls.attention_hls
- finn.custom_op.fpgadataflow.hls.checksum_hls
- finn.custom_op.fpgadataflow.hls.concat_hls
- finn.custom_op.fpgadataflow.hls.duplicatestreams_hls
- finn.custom_op.fpgadataflow.hls.elementwise_binary_hls
- finn.custom_op.fpgadataflow.hls.globalaccpool_hls
- finn.custom_op.fpgadataflow.hls.hwsoftmax_hls
- finn.custom_op.fpgadataflow.hls.iodma_hls
- finn.custom_op.fpgadataflow.hls.labelselect_hls
- finn.custom_op.fpgadataflow.hls.layernorm_hls
- finn.custom_op.fpgadataflow.hls.lookup_hls
- finn.custom_op.fpgadataflow.hls.matrixvectoractivation_hls
- finn.custom_op.fpgadataflow.hls.outer_shuffle_hls
- finn.custom_op.fpgadataflow.hls.pool_hls
- finn.custom_op.fpgadataflow.hls.requant_hls
- finn.custom_op.fpgadataflow.hls.split_hls
- finn.custom_op.fpgadataflow.hls.squeeze_hls
- finn.custom_op.fpgadataflow.hls.streamingdatawidthconverter_hls
- finn.custom_op.fpgadataflow.hls.streamingfifo_hls
- finn.custom_op.fpgadataflow.hls.thresholding_hls
- finn.custom_op.fpgadataflow.hls.tlastmarker_hls
- finn.custom_op.fpgadataflow.hls.unsqueeze_hls
- finn.custom_op.fpgadataflow.hls.upsampler_hls
- finn.custom_op.fpgadataflow.hls.vectorvectoractivation_hls
- finn.custom_op.fpgadataflow.hlsbackend
- finn.custom_op.fpgadataflow.hwcustomop
- finn.custom_op.fpgadataflow.hwsoftmax
- finn.custom_op.fpgadataflow.inner_shuffle
- finn.custom_op.fpgadataflow.labelselect
- finn.custom_op.fpgadataflow.layernorm
- finn.custom_op.fpgadataflow.lookup
- finn.custom_op.fpgadataflow.matrixvectoractivation
- finn.custom_op.fpgadataflow.outer_shuffle
- finn.custom_op.fpgadataflow.pool
- finn.custom_op.fpgadataflow.requant
- finn.custom_op.fpgadataflow.reshape
- finn.custom_op.fpgadataflow.rtl
- finn.custom_op.fpgadataflow.rtl.convolutioninputgenerator_rtl
- finn.custom_op.fpgadataflow.rtl.elementwise_binary_rtl
- finn.custom_op.fpgadataflow.rtl.finn_loop
- finn.custom_op.fpgadataflow.rtl.fmpadding_rtl
- finn.custom_op.fpgadataflow.rtl.inner_shuffle_rtl
- finn.custom_op.fpgadataflow.rtl.layernorm_rtl
- finn.custom_op.fpgadataflow.rtl.matrixvectoractivation_rtl
- finn.custom_op.fpgadataflow.rtl.requant_rtl
- finn.custom_op.fpgadataflow.rtl.reshape_rtl
- finn.custom_op.fpgadataflow.rtl.streamingdatawidthconverter_rtl
- finn.custom_op.fpgadataflow.rtl.streamingfifo_rtl
- finn.custom_op.fpgadataflow.rtl.thresholding_rtl
- finn.custom_op.fpgadataflow.rtl.vectorvectoractivation_rtl
- finn.custom_op.fpgadataflow.rtlbackend
- finn.custom_op.fpgadataflow.shuffle
- finn.custom_op.fpgadataflow.split
- finn.custom_op.fpgadataflow.squeeze
- finn.custom_op.fpgadataflow.streamingdataflowpartition
- finn.custom_op.fpgadataflow.streamingdatawidthconverter
- finn.custom_op.fpgadataflow.streamingfifo
- finn.custom_op.fpgadataflow.thresholding
- finn.custom_op.fpgadataflow.unsqueeze
- finn.custom_op.fpgadataflow.upsampler
- finn.custom_op.fpgadataflow.vectorvectoractivation
Multi-head attention split and merge operators.
class SplitMultiHeads(HWCustomOp)Split input tensor into multiple attention heads.
This operator splits the input tensor after input projections to create separate attention heads for multi-head attention mechanisms. The output can be either packed as a single tensor or split into multiple output tensors.
def __init__(onnx_node, **kwargs)Initialize the SplitMultiHeads operator.
def get_nodeattr_types()Get node attribute types for the SplitMultiHeads operator.
Defines the attributes that must be present on this node, including the number of attention heads, packing mode, data type, and other configuration parameters inherited from the parent HWCustomOp class.
Returns:
-
dict- Dictionary mapping attribute names to their type specifications
@property
def heads()Get number of attention heads.
@property
def packed()Get packed attribute.
@property
def dtype()Get data type attribute.
@property
def num_elems()Get number of elements attribute.
@property
def num_inputs()Get number of inputs attribute.
def make_shape_compatible_op(model: ModelWrapper)Make an operation compatible with the output shape for shape inference Note: Propagates shape forward, i.e., never asks for the shape of the output, even if it seems easier.
def infer_node_datatype(model: ModelWrapper)Infer the datatype of the node output.
def execute_node(context, graph)Execute the node.
def verify_node()Verify node attribute/input/output correctness.
def get_input_datatype(ind=0)Get input data type.
def get_output_datatype(ind=0)Get output data type.
def get_normal_input_shape(ind=0)Get normal input shape.
def get_normal_output_shape(ind=0)Get normal output shape.
def get_folded_input_shape(ind=0)Get folded input shape.
def get_folded_output_shape(ind=0)Get folded output shape.
def get_instream_width(ind=0)Get input stream width.
def get_outstream_width(ind=0)Get output stream width.
def get_number_output_values()Get the number of expected output values, i.e. how many times read() could/should be called on any output stream of this operator
def get_exp_cycles()Derive the expected cycles of the operator given the folding configuration.
class MergeMultiHeads(HWCustomOp)Merging of attention heads (before output projections) custom operator.
def __init__(onnx_node, **kwargs)Initialize the operator.
def get_nodeattr_types()Define attributes which must be present on this node
@property
def heads()Get number of attention heads.
@property
def packed()Get packed attribute.
@property
def dtype()Get data type.
@property
def num_elems()Get number of elements.
@property
def num_inputs()Get number of inputs.
@property
def squeezed()Get squeezed attribute.
def make_shape_compatible_op(model: ModelWrapper)Makes an operation compatible with the output shape for shape inference Note: Propagates shape forward, i.e., never asks for the shape of the output, even if it seems easier.
def infer_node_datatype(model: ModelWrapper)Infer the datatype of the node output.
def execute_node(context, graph)Executes multi-head slicing in simulation (either python c++ or rtl sim).
def verify_node()Verify node attribute/input/output correctness.
def get_input_datatype(ind=0)Get input data type.
def get_output_datatype(ind=0)Get output data type.
def get_normal_input_shape(ind=0)Get normal input shape.
def get_normal_output_shape(ind=0)Get normal output shape.
def get_folded_input_shape(ind=0)Get folded input shape.
def get_folded_output_shape(ind=0)Get folded output shape.
def get_instream_width(ind=0)Get input stream width.
def get_outstream_width(ind=0)Get output stream width.
def get_number_output_values()Gets the number of expected output values, i.e. how many times read() could/should be called on any output stream of this operator.
def get_exp_cycles()Derive the expected cycles of the operator given the folding configuration.
class StreamingConcat(HWCustomOp)Abstraction layer for HW implementation of Concat. Only supports concatenating along the last (channel) axis.
class ConvolutionInputGenerator(HWCustomOp)Abstraction layer for HW implementation of ConvolutionInputGenerator
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_instream_width(ind=0)Returns stream width, input and output stream width are equal for the sliding window function
class Crop(HWCustomOp)Abstraction layer for Crop layers.
class DuplicateStreams(HWCustomOp)Abstraction layer for HW implementation of DuplicateStreams
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_instream_width(ind=0)Returns input stream width.
def get_outstream_width(ind=0)Returns output stream width.
class ElementwiseBinaryOperation(HWCustomOp)def calc_wmem()Calculates and returns WMEM.
class FMPadding(HWCustomOp)Abstraction layer for HW impplementation of FMPadding. Pads input image by given amount.
def get_padded_odim()Return the padded spatial size of the output.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output. (Same as input datatype)
class FMPadding_Pixel(HWCustomOp)def get_padded_odim()Return the padded spatial size of the output.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output. (Same as input datatype)
class GlobalAccPool(HWCustomOp)Abstraction layer for HW implementation of GlobalAccPool
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_instream_width(ind=0)Returns input stream width.
def get_outstream_width(ind=0)Returns output stream width.
HLS backend implementation of scaled dot-product attention operator.
class ScaledDotProductAttention_hls( # noqa: Class name does not follow
# CapWords convention
ScaledDotProductAttention,
HLSBackend)HLS Backend specialization of the Scaled Dot-product Attention Operator.
def get_nodeattr_types()Return node attributes matching the HLS operator.
def get_ap_int_max_w()Return the maximum width of any ap_int used in this operator.
Calculates the maximum bit width required across all inputs, outputs, and optional mask elements for use in determining the widest ap_int type needed for HLS synthesis.
def global_includes()Generate list of C++ includes to be placed at the top of the generated code.
Adds necessary header files for the attention operator HLS implementation, including FINN HLSLIB activation functions and the attention-specific HLS implementation header.
def generate_params(model: ModelWrapper, path)Generate C++ parameters file including activation function thresholds.
Creates parameter files including activation function thresholds and other configuration parameters needed for HLS synthesis. The code generation directory is specified as an argument to work for both RTL and C++ simulation modes.
def defines(var)Generate C++ code of type alias, global constant and macro definitions.
def docompute()Generate C++ code for calling the computation part of the operator.
Creates the main HLS computation call with proper RAM style directives for thresholds and mask storage, along with necessary pragmas for threshold arrays and storage binding.
def blackboxfunction()Generate the head of the C++ function from which the IP block will be generated.
Creates the function signature describing the top level interface of the attention operator for HLS synthesis (ipgen).
def pragmas()Generate C++ pragmas to be inserted into the main function.
Creates HLS interface directives specifying how to create RTL ports for the top-level function arguments in both C++ simulation and ipgen-blackboxfunction.
def get_verilog_top_module_intf_names()Return the names of input and output interfaces grouped by protocol.
Collects interface names in a dictionary organized by protocol type (clock, reset, AXI stream, etc.) for Verilog module generation.
class CheckSum_hls(HWCustomOp, HLSBackend)Class that corresponds to custom_hls checksum function.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
class StreamingConcat_hls(StreamingConcat, HLSBackend)Streaming concatenation node with dynamically generated HLS. Only supports concatenating along the last axis.
class DuplicateStreams_hls(DuplicateStreams, HLSBackend)Class that corresponds to finn-hlslib function of the same name.
HLS backend implementation for elementwise binary operations.
This module provides HLS (High-Level Synthesis) implementations of elementwise binary operations with support for various memory modes, broadcasting, and parallel execution.
class ElementwiseBinaryOperation_hls( # CapWords convention
ElementwiseBinaryOperation, HLSBackend)HLS backend implementation of elementwise binary operations.
Supports various binary operations (add, subtract, multiply, etc.) with configurable memory modes, broadcasting, and parallel execution units (PEs).
def get_nodeattr_types()Get node attribute types for this operator.
dict Dictionary of node attribute names and their types.
def get_ap_int_max_w()Get maximum ap_int width used in this operator.
int Maximum bit width of any ap_int used in the operator.
def adapt_for_loop_body(input_types)Adapt elementwise binary operator for loop body execution.
When an elementwise operator is placed inside a loop, parameters that are indexed per iteration (PARAMETER type) need to be received as streaming inputs rather than embedded constants. This method changes the lhs_style/rhs_style attributes from "const" to "input" as needed.
def code_generation_ipgen(model, fpgapart, clk) -> NoneGenerate c++ code and tcl script for ip generation.
def global_includes() -> NoneGenerate list of C++ includes for the top of generated code.
def generate_params(model: ModelWrapper, path: str) -> NoneGenerate C++ parameters file for constant initializer inputs.
model : ModelWrapper The ONNX model wrapper. path : str Path to the code generation directory.
def defines(var) -> NoneGenerate C++ type aliases, global constants and macro definitions.
var : str Variable name (currently unused).
def read_npy_data() -> NoneGenerate C++ code for reading data from .npy files for C++ simulation testing.
def strm_decl() -> NoneGenerate C++ code for declaring all streams involved in C++ simulation testing.
def docompute() -> NoneGenerate C++ code for the computation part of the operator.
def dataoutstrm() -> NoneGenerate C++ code for reading output stream and converting to numpy format.
def save_as_npy() -> NoneGenerate C++ code for saving simulation output to numpy format.
Notes:
This is currently empty in all HLSBackends. Functionality is now integrated into dataoutstrm().
def blackboxfunction() -> NoneGenerate C++ function head for the IP block (used during synthesis).
def pragmas() -> NoneGenerate C++ HLS pragmas for simulation and synthesis.
def get_verilog_top_module_intf_names() -> dict[str, list[str]]Get the names of input and output interfaces grouped by protocol.
dict Dictionary mapping protocol types to interface names.
def code_generation_ipi()Generate IPI (IP Integrator) code for Vivado block design integration.
list List of TCL commands for IP integration.
def execute_node(context, graph) -> NoneExecute this node in the given context.
context : dict Execution context mapping tensor names to numpy arrays. graph : onnx.GraphProto The ONNX graph containing this node.
@register_custom_op
class ElementwiseAdd_hls(ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseAdd)HLS implementation of elementwise addition operation.
@register_custom_op
class ElementwiseSub_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseSub)HLS implementation of elementwise subtraction operation.
@register_custom_op
class ElementwiseMul_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseMul)HLS implementation of elementwise multiplication operation.
@register_custom_op
class ElementwiseDiv_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseDiv)HLS implementation of elementwise division operation.
@register_custom_op
class ElementwiseAnd_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseAnd)HLS implementation of elementwise logical AND operation.
@register_custom_op
class ElementwiseOr_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseOr)HLS implementation of elementwise logical OR operation.
@register_custom_op
class ElementwiseXor_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseXor)HLS implementation of elementwise logical XOR operation.
@register_custom_op
class ElementwiseEqual_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseEqual)HLS implementation of elementwise equality comparison operation.
@register_custom_op
class ElementwiseLess_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseLess)HLS implementation of elementwise less-than comparison operation.
@register_custom_op
class ElementwiseLessOrEqual_hls( # CapWords convention
ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseLessOrEqual)HLS implementation of elementwise less-than-or-equal comparison operation.
@register_custom_op
class ElementwiseGreater_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseGreater)HLS implementation of elementwise greater-than comparison operation.
@register_custom_op
class ElementwiseGreaterOrEqual_hls( # CapWords convention
ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseGreaterOrEqual)HLS implementation of elementwise greater-than-or-equal comparison operation.
@register_custom_op
class ElementwiseBitwiseAnd_hls( # CapWords convention
ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseBitwiseAnd)HLS implementation of elementwise bitwise AND operation.
@register_custom_op
class ElementwiseBitwiseOr_hls( # CapWords convention
ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseBitwiseOr)HLS implementation of elementwise bitwise OR operation.
@register_custom_op
class ElementwiseBitwiseXor_hls( # CapWords convention
ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseBitwiseXor)HLS implementation of elementwise bitwise XOR operation.
@register_custom_op
class ElementwiseBitShift_hls( # CapWords convention
ElementwiseBinaryOperation_hls,
elementwise_binary.ElementwiseBitShift)HLS implementation of elementwise bit shift operation.
Supports both left and right bit shift operations based on the 'direction' attribute.
def get_nodeattr_types() -> dictGet node attribute types for bit shift operation.
dict Dictionary of node attribute names and their types, including the direction attribute for selecting shift direction.
@register_custom_op
class ElementwiseMax_hls( # CapWords convention
ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseMax)HLS Implementation of the elementwise max operation.
class GlobalAccPool_hls(GlobalAccPool, HLSBackend)Class that corresponds to finn-hlslib AccPool_Batch function.
class HWSoftmax_hls(HWSoftmax, HLSBackend)def timeout_value()Set timeout value for HLS functions defined for one clock cycle
class IODMA_hls(HWCustomOp, HLSBackend)Class that corresponds to finn-hlslib DMA function(s).
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output. (Same as input datatype)
def get_ap_int_max_w()Return the maximum width of any ap_int used in this module.
class LabelSelect_hls(LabelSelect, HLSBackend)Class that corresponds to finn-hlslib LabelSelect_Batch function.
class LayerNorm_hls(LayerNorm, HLSBackend)def timeout_value()Set timeout value for HLS functions defined for one clock cycle
class Lookup_hls(Lookup, HLSBackend)Streaming elementwise HLS lookup, mapping indices to values.
class MVAU_hls(MVAU, HLSBackend)Corresponds to finn-hlslib MatrixVectorActivation_Batch function.
def lut_estimation()Calculates resource estimations for LUTs based on:
- FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
- M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O'Brien, Y. Umuroglu, M. Leeser and K. Vissers
-
- Sep 2018
def code_generation_ipgen(model, fpgapart, clk)Generates c++ code and tcl script for ip generation.
def get_template_param_values()Returns the template parameter values according to input, output and weight data types.
def minimize_weight_bit_width(model)Minimize weight and threshold datatypes, with HLS-specific adjustments.
The HLS implementation uses the threshold datatype for comparisons. When the threshold datatype is narrower than the accumulator datatype, accumulator values get truncated, which can cause incorrect results. To prevent this, ensure threshold datatype is at least as wide as accumulator datatype.
def auto_size_simd(I_dim: int, SIMD: int) -> Optional[int]Return the smallest divisor d of I_dim such that d > SIMD. if no such divisor exists, return None.
class OuterShuffle_hls(OuterShuffle, HLSBackend)def timeout_value()Set timeout value for HLS functions defined for one clock cycle
HLSBackend specialization for generic pooling operators: MaxPool, AvgPool, AccPool and QuantAvgPool.
class Pool_hls(Pool, HLSBackend)Class that corresponds to finn-hlslib Pool_batch function. Requires ConvolutionInputGenerator(depthwise == 1) to format its input
Input shape (BatchSize,OutImgDim,OutImgDim,TotalKernelSize*Channels) Output shape (BatchSize,OutImgDim,OutImgDim,Channels)
Notes:
-
The input shape was chosen to be compatible with im2col (only true when there is not folding).
-
The actual data layout produced by the hlslib kernels is different for depthwise ops.
-
depthwise SWG: (1, OFMDim, OFMDim, IFMChannels/PE, K, K, PE)
Channels can be folded using PE (SIMD from the input perspective)
def get_nodeattr_types()Get dictionary of custom node attributes with their types and default values.
def global_includes()List include directives for generated HLS code.
def defines(var)Constant and type definitions for generated HLS code.
def docompute()Generates the computational part of the HLS C++ code.
def pragmas()Generate HLS pragmas to apply to the HLS C++ coede.
def blackboxfunction()Blackbox function interface from which the IP will be generated.
def execute_node(context, graph)Execute the node in HLS C++ simulation.
class Requant_hls(Requant, HLSBackend)HLS backend for Requant operation.
Computes: clip(round(x * scale + bias), min, max)
Scale and bias are embedded as constants in the generated HLS code. Per-channel scale and bias are supported. When scale=1 and bias=0, the generated code skips the unnecessary multiply/add operations.
Note: This backend is primarily for FLOAT32 inputs. For integer inputs, prefer Requant_rtl which is more efficient.
def get_exp_cycles()Returns expected number of cycles for execution.
Adds a constant offset to account for HLS pipeline initialization overhead.
def generate_params(model, path)Generate scale and bias parameter arrays as HLS header file.
def read_npy_data()Generate code for reading input data from .npy file.
def strm_decl()Generate stream declarations for C++ simulation.
def dataoutstrm()Generate code for writing output data to .npy file.
def save_as_npy()Empty - output saving is handled in dataoutstrm.
def execute_node(context, graph)Execute the node using cppsim or rtlsim.
Custom implementation that only passes input 0 (data), since scale and bias are embedded as parameters in the generated code.
class StreamingSplit_hls(StreamingSplit, HLSBackend)Streaming split node with dynamically generated HLS. Only supports splitting along the last axis.
HLS backend implementation of the Squeeze operator.
@register_custom_op
class Squeeze_hls(Squeeze, HLSBackend)HLS backend implementation of the Squeeze operator.
Removes single-dimension entries from the shape of a tensor using HLS synthesis.
def get_nodeattr_types()Return the dictionary of node attributes for the HLS Squeeze operator.
def global_includes() -> NoneGenerate list of C++ includes for the top of the generated code.
def defines(var) -> NoneGenerate C++ code for type alias, global constant, and macro definitions.
def execute_node(context, graph) -> NoneExecute node via generic HLSBackend implementation (cppsim/rtlsim).
def docompute() -> NoneGenerate C++ code for the computation part of the operator.
def blackboxfunction() -> NoneGenerate the C++ function signature for the IP block generation.
class StreamingDataWidthConverter_hls(StreamingDataWidthConverter, HLSBackend)Class that corresponds to finn-hlslib StreamingDataWidthConverter_Batch function.
class StreamingFIFO_hls(StreamingFIFO, HLSBackend)HLS-based FIFO implementation. Currently only used as virtual FIFO for live FIFO-sizing.
HLS backend implementation for neural network thresholding operations.
This module provides the Thresholding_hls class which implements hardware-accelerated thresholding/activation functions using High-Level Synthesis (HLS) for FPGA deployment. Supports multiple memory modes, runtime weight loading, and various data types.
class Thresholding_hls(Thresholding, HLSBackend)Class that corresponds to finn-hls Thresholding_Batch function.
def __init__(onnx_node: NodeProto, **kwargs: Any) -> NoneInitialize the Thresholding_hls layer.
onnx_node : NodeProto ONNX node representing this operation **kwargs : dict Additional arguments passed to parent classes
def get_nodeattr_types() -> dict[str, tuple[str, bool, str, set[str]]
| tuple[str, bool, int, set[int]]]Get the types and default values for node attributes.
dict Dictionary mapping attribute names to their type specifications
def bram_estimation() -> intCalculate BRAM cost if resource set to BRAM.
int Number of BRAM blocks required.
def lut_estimation() -> intCalculate LUT cost, taking memory resource type into account.
int Number of LUTs required for comparators and optional LUTRAM.
def get_ap_int_max_w() -> intGet the maximum ap_int width used in this layer.
int Maximum bitwidth of any ap_int used in the operator.
def code_generation_ipgen(model: ModelWrapper, fpgapart: str,
clk: float) -> NoneGenerate C++ code and tcl script for IP generation.
model : ModelWrapper The ONNX model wrapper. fpgapart : str Target FPGA part name. clk : float Clock period in nanoseconds.
def get_template_param_values() -> dict[str, str]Return the template parameter values according to input, output and weight data types.
dict[str, str] Dictionary of template parameter names to their values.
def make_weight_file(weights: np.ndarray, weight_file_mode: str,
weight_file_name: str) -> NoneProduce a file containing given weights (thresholds) in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.
weights : np.ndarray Numpy array with weights to be put into the file. weight_file_mode : str One of {hls_header, decoupled_verilog_dat, decoupled_runtime, decoupled_npy}. weight_file_name : str Filename for the weight file to be generated.
def generate_params(model: ModelWrapper, path: str) -> NoneGenerate parameter files for thresholds.
model : ModelWrapper The ONNX model wrapper containing initializers. path : str Code generation directory path.
def execute_node(context: dict, graph: object) -> NoneExecute this node in the given context.
context : dict Execution context containing input/output tensors. graph : onnx.GraphProto The ONNX graph (unused but required by interface).
def global_includes() -> NoneGenerate list of global C++ includes.
def defines(var: object) -> NoneGenerate C++ defines for template parameters.
var : object Unused parameter for compatibility with base class.
def read_npy_data() -> NoneGenerate C++ code for reading NPY data for simulation.
def strm_decl() -> NoneGenerate C++ stream declarations.
def docompute() -> NoneGenerate C++ code for the main computation.
def dataoutstrm() -> NoneGenerate C++ code for writing output data stream.
def blackboxfunction() -> NoneGenerate C++ black box function signature.
def pragmas() -> NoneGenerate HLS pragmas for synthesis.
def code_generation_ipi() -> list[str]Generate Vivado IPI tcl commands for block design integration.
list[str] List of tcl commands for IPI block design generation.
def get_op_and_param_counts() -> dict[str, int]Get operation and parameter counts for this layer.
dict[str, int] Dictionary mapping parameter type to count.
def ipgen_extra_directives() -> list[str]Return a list of extra tcl directives for HLS synthesis.
list[str] List of tcl directives for HLS IP generation.
def derive_characteristic_fxns(period: int,
override_rtlsim_dict: dict | None = None
) -> NoneDerive characteristic functions for performance estimation.
period : int Clock period in nanoseconds override_rtlsim_dict : dict | None Optional dictionary to override RTL simulation parameters.
None
def minimize_weight_bit_width(model)Minimize threshold datatype, with HLS-specific adjustments.
The HLS implementation uses the threshold datatype for comparisons. When the threshold datatype is narrower than the input datatype, input values get truncated, which can cause incorrect results. To prevent this, ensure threshold datatype is at least as wide as input datatype.
class TLastMarker_hls(HWCustomOp, HLSBackend)Node that adds/removes AXI stream TLAST signals where needed. Its behavior is transparent in node-by-node execution, only visible in IP-stitched rtlsim or actual hardware. This node may be needed at the end of the network to signal a DMA write (needed by the FINN PYNQ shell) or at the beginning to remove the end-of-burst from DMA read.
HLS backend implementation of the Unsqueeze operator.
@register_custom_op
class Unsqueeze_hls(Unsqueeze, HLSBackend)HLS backend implementation of the Unsqueeze operator.
Inserts single-dimension entries into the shape of a tensor using HLS synthesis.
def get_nodeattr_types()Return the dictionary of node attributes for the HLS Unsqueeze operator.
def global_includes() -> NoneGenerate list of C++ includes for the top of the generated code.
def defines(var) -> NoneGenerate C++ code for type alias, global constant, and macro definitions.
def execute_node(context, graph) -> NoneExecute node via generic HLSBackend implementation (cppsim/rtlsim).
def docompute() -> NoneGenerate C++ code for the computation part of the operator.
def blackboxfunction() -> NoneGenerate the C++ function signature for the IP block generation.
class UpsampleNearestNeighbour_hls(UpsampleNearestNeighbour, HLSBackend)Corresponds to finn-hlslib UpsampleNearestNeighbour function. Upsampling is done with the Nearest Neighbour algorithm. The layer expects square feature maps for the in and output.
class VVAU_hls(VVAU, HLSBackend)Corresponds to finn-hlslib Vector_Vector_Activate_Batch function
def lut_estimation()Calculates resource estimations for LUTs based on:
- FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
- M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O'Brien, Y. Umuroglu, M. Leeser and K. Vissers
-
- Sep 2018
def code_generation_ipgen(model, fpgapart, clk)Generates c++ code and tcl script for ip generation.
def get_template_param_values()Returns the template parameter values according to input, output and weight data types.
def minimize_weight_bit_width(model)Minimize weight and threshold datatypes, with HLS-specific adjustments.
The HLS implementation uses the threshold datatype for comparisons. When the threshold datatype is narrower than the accumulator datatype, accumulator values get truncated, which can cause incorrect results. To prevent this, ensure threshold datatype is at least as wide as accumulator datatype.
HLS backend implementation for FINN custom operations.
class HLSBackend(ABC)HLSBackend class all custom ops that correspond to a finn-hlslib function are using functionality of. Contains different functions every HLS custom node should have. Some as abstract methods, these have to be filled when writing a new HLS custom op node.
def get_nodeattr_types()Return dictionary of node attribute types and properties.
def get_all_verilog_paths()Return list of all folders containing Verilog code for this node.
def get_all_verilog_filenames(abspath=False)Return list of all Verilog files used for this node.
def prepare_rtlsim(behav=False)Creates a xsi emulation library for the RTL code generated for this node, sets the rtlsim_so attribute to its path.
def code_generation_ipgen(model, fpgapart, clk)Generate C++ code and TCL script for IP generation.
def ipgen_default_directives()Return list of default HLS synthesis directives.
def ipgen_extra_directives()Return a list of extra TCL directives for HLS synthesis.
def ipgen_singlenode_code(fpgapart=None)Build the bash script for IP generation using the CallHLS utility.
def code_generation_cppsim(model)Generate C++ code for simulation (cppsim).
def code_generation_ipi()Construct and return the TCL for node instantiation in Vivado IPI.
def compile_singlenode_code()Build bash script for compilation using CppBuilder and execute to produce executable.
def npy_to_dynamic_output(context)Read output.npy file generated from cppsim and place into context dictionary.
def exec_precompiled_singlenode_model()Execute precompiled executable.
def hls_sname()Get the naming convention used by Vitis HLS for stream signals Example: the TDATA for a stream called "out" would be out_V_TDATA.
def execute_node(context, graph)Execute node in specified mode (cppsim or rtlsim).
@abstractmethod
def global_includes()Function to set the global includes for c++ code that has to be generated for cppsim or rtlsim, is member function of HLSBackend class but has to be filled by every node.
@abstractmethod
def defines(var)Function to set the define commands for c++ code that has to be generated for cppsim or rtlsim, is member function of HLSBackend class but has to be filled by every node.
var: makes it possible to reuse the function for different c++ code generation. I.e. if set to "ipgen" in MatrixVectorActivation additional PRAGMA defines are added.
def read_npy_data()Generate commands for reading data from .npy file in C++. Might need to be overwritten depending on CustomOp.
def strm_decl()Generate commands for stream declaration in C++. Might need to be overwritten depending on CustomOp.
@abstractmethod
def docompute()Function to generate the commands for the computational part of the c++ code, is member function of HLSBackend class but has to be filled by every node.
def dataoutstrm()Generate commands for reading out data from C++ and converting to npy format. Might need to be overwritten depending on CustomOp.
def save_as_npy()Generate commands for saving data in .npy file in C++.
@abstractmethod
def blackboxfunction()Function to generate a blackbock function in c++ from which an IP block will be generated, is member function of HLSBackend class but has to be filled by every node.
def pragmas()Generate pragma commands in C++. Might need to be overwritten depending on CustomOp.
def get_ap_int_max_w()Return the maximum width of any ap_int used in this module. Used to set the AP_INT_MAX_W definition for HLS.
def timeout_value()Set timeout value for HLS functions defined for one clock cycle.
def timeout_condition()Set timeout condition for HLS functions defined for one clock cycle.
def timeout_read_stream()Set reading output stream procedure for HLS functions defined for one clock cycle.
Base class for hardware custom operations in FINN dataflow architecture.
This module provides the HWCustomOp base class for custom operations that can be implemented using HLS or RTL backends in FPGA dataflow architectures.
class HWCustomOp(CustomOp)HWCustomOp class all custom ops that can be implemented with either HLS or RTL backend are based on. Contains different functions every fpgadataflow custom node should have. Some as abstract methods, these have to be filled when writing a new fpgadataflow custom op node.
def __init__(onnx_node: NodeProto, **kwargs: Any) -> NoneInitialize HWCustomOp with an ONNX node.
Arguments:
-
onnx_node- The ONNX node to wrap. -
**kwargs- Additional keyword arguments passed to parent class.
def get_nodeattr_types() -> dict[
str,
tuple[str, bool, int | float | str | bool | npt.NDArray | list]
| tuple[str, bool, int | float | str | bool | npt.NDArray | list, set
| None],
]Return node attribute types for HWCustomOp.
Returns:
Dictionary mapping attribute names to their type specifications.
def make_shape_compatible_op(model: "ModelWrapper") -> NodeProtoMake a shape compatible operation.
Arguments:
-
model- The model wrapper containing this node.
Returns:
The ONNX node for the shape compatible operation.
def get_verilog_top_module_name() -> strReturn the Verilog top module name for this node.
def get_verilog_top_module_intf_names(
) -> dict[str, list[tuple[str, int]] | list[str]]Return a dict of names of input and output interfaces. The keys reflect the protocols each interface implements: 'clk', 'rst', 'm_axis', 's_axis', 'aximm', 'axilite'. Values are lists of tuples (axis, aximm) or names (axilite): 'axis' tuples correspond to the list of node inputs in order, each tuple is (interface_name, interface_width_bits). axilite always assumed to be 32 bits and is not tuple (name only). Each block must have at most one aximm and one axilite.
def get_rtlsim() -> SimEngineReturn a xsi wrapper for the emulation library for this node.
def close_rtlsim(sim: SimEngine) -> NoneClose and free up resources for rtlsim.
Arguments:
-
sim- The RTL simulation object to close.
def node_res_estimation(fpgapart: str) -> dict[str, int | float]Return summarized resource estimation of BRAMs and LUTs of the node as a dictionary.
def bram_efficiency_estimation() -> floatEstimate BRAM efficiency.
Returns actual parameter storage needed divided by the allocated BRAM storage (from estimation).
def uram_efficiency_estimation() -> floatEstimate URAM efficiency.
Returns actual parameter storage needed divided by the allocated URAM storage (from estimation).
def bram_estimation() -> intEstimate BRAM resource usage.
Member function of HWCustomOp class that must be implemented by every node.
def uram_estimation() -> intEstimate UltraRAM resource usage.
Member function of HWCustomOp class that must be implemented by every node.
def lut_estimation() -> intEstimate LUT resource usage.
Member function of HWCustomOp class that must be implemented by every node.
def dsp_estimation(fpgapart: str) -> intEstimate DSP resource usage.
Member function of HWCustomOp class that must be implemented by every node.
Arguments:
-
fpgapart- Target FPGA part string.
def get_exp_cycles() -> intEstimate expected cycles for set folding.
Member function of HWCustomOp class that must be implemented by every node.
def get_op_and_param_counts() -> dict[str, int]Return a dictionary with number of ops needed per inference.
Returns number of ops needed per inference for this layer as well as parameter count (weights, thresholds, etc.). Entries should be in the format: {op_ : , param_: }.
def reset_rtlsim(sim: SimEngine) -> NoneSet reset input in finnxsi to zero, toggle the clock and set it back to one.
def rtlsim_multi_io(sim: SimEngine,
io_dict: dict[str, Any],
sname: str = "_V") -> NoneRun rtlsim for this node, supports multiple i/o streams.
def verify_node() -> NoneCan be implemented to verify that all attributes the node needs are there and that particular attributes are set correctly. Can also check if the number of inputs is equal to the expected number.
def generate_params(model: "ModelWrapper", path: str) -> NoneGenerate parameters (i.e. weights and thresholds).
Member function of HWCustomOp class that must be implemented by every node that needs to generate parameters.
Arguments:
-
model- The model wrapper containing this node. -
path- Path where parameters should be generated.
def get_number_output_values() -> intGet the number of expected output values.
Member function of HWCustomOp class that must be implemented by every node.
@abstractmethod
def get_input_datatype(ind: int = 0) -> BaseDataTypeReturn FINN DataType of input stream ind.
@abstractmethod
def get_output_datatype(ind: int = 0) -> BaseDataTypeReturn FINN DataType of output stream ind.
@abstractmethod
def get_normal_input_shape(
ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]Return normal input shape if implemented.
@abstractmethod
def get_normal_output_shape(
ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]Return folded output shape if implemented.
@abstractmethod
def get_folded_input_shape(
ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]Return folded input shape (according to synapse folding), if implemented.
@abstractmethod
def get_folded_output_shape(
ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]Return folded output shape (according to neuron folding), if implemented.
@abstractmethod
def get_instream_width(ind: int = 0) -> intReturn input stream width, if implemented.
@abstractmethod
def get_outstream_width(ind: int = 0) -> intReturn output stream width, if implemented.
def get_instream_width_padded(ind: int = 0) -> intReturn input stream width padded to a multiple of 8.
This is required by the AXI Stream spec.
Arguments:
-
ind- Input index (default: 0).
def get_outstream_width_padded(ind: int = 0) -> intReturn output stream width padded to a multiple of 8.
This is required by the AXI Stream spec.
Arguments:
-
ind- Output index (default: 0).
def calc_tmem() -> intCalculate and returns the TMEM.
def calc_wmem() -> intCalculate and returns the WMEM.
def generate_hdl_memstream(fpgapart, pumped_memory=0)Helper function to generate verilog code for memstream component. Currently utilized by MVAU, VVAU and HLS Thresholding layer.
def generate_hdl_fetch_weights(fpgapart)Helper function to generate verilog code for fetch_weights component. Currently utilized by MVAU.
def generate_hdl_dynload() -> NoneGenerate HDL for dynamic load wrapper.
def derive_characteristic_fxns(period: int,
override_rtlsim_dict: dict | None = None,
pre_hook=None) -> NoneReturn the unconstrained characteristic functions for this node.
Arguments:
-
period- The characterization period. -
override_rtlsim_dict- Optional dictionary to override rtlsim settings.
Raises:
-
ValueError- If period is too short to characterize the node.
def adapt_for_loop_body(input_types)Called by LoopRolling transformation to allow operators to adapt their attributes when being placed inside a loop body.
This base implementation does nothing. Operators that need to modify their behavior when placed in loops should override this method.
Arguments:
-
input_types- List of LoopBodyInputType values for each input, indicating whether inputs are ACTIVATION, CONSTANT, PARAMETER, etc.
Example:
If an operator has a parameter that becomes a streamed input
in a loop context (PARAMETER type), it might need to change
an attribute like rhs_style from "const" to "input".
class HWSoftmax(HWCustomOp)Abstraction layer for HW implementation of SoftMax layers.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
class InnerShuffle(HWCustomOp)Abstraction layer for the Parallel 2D transpose.
def get_exp_cycles()Estimate cycles for the double-buffered InnerShuffle RTL.
The RTL uses two BRAM banks with page_size = I*J/SIMD. The first page must be fully written before reads can begin, adding one extra page of latency beyond the streaming throughput. Empirically verified to match cycles_rtlsim within atol=10.
class LabelSelect(HWCustomOp)Abstraction layer for HW implementation of LabelSelect
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_instream_width(ind=0)Returns input stream width.
def get_outstream_width(ind=0)Returns output stream width.
class LayerNorm(HWCustomOp)Abstraction layer for HW implementation of the LayerNorm layer.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
class Lookup(HWCustomOp)Abstraction layer for HW implementation of streaming elementwise lookup, mapping indices to values.
Matrix-Vector-Activation Unit (MVAU) hardware implementation.
This module implements the MVAU operation for FPGA deployment, which performs matrix-vector multiplication optionally followed by activation/thresholding. Supports various memory modes, parallelization strategies, and quantized datatypes.
class MVAU(HWCustomOp)Abstraction layer for HW implementation of MatrixVectorActivation layers.
def __init__(onnx_node, **kwargs)Initialize the MVAU custom operation.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications
def execute_node(context, graph)Execute this MVAU node.
Performs matrix-vector multiplication and optional activation/thresholding.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
def verify_node()Verify that this node has valid attributes and configuration.
list of str List of verification messages/warnings
def infer_node_datatype(model)Infer and set output datatype based on input datatype and node attributes.
model : ModelWrapper FINN ModelWrapper containing this node
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_accumulator_datatype()Returns FINN DataType of accumulator
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_instream_width(ind=0)Get width of input stream in bits.
ind : int Input stream index (0=activations, 1=weights, 2=thresholds)
int Bit width of the specified input stream
def get_outstream_width(ind=0)Get width of output stream in bits.
ind : int Output stream index
int Bit width of the output stream
def get_folded_input_shape(ind=0)Get shape of folded (parallelized) input tensor.
ind : int Input index (0=activations, 1=weights)
tuple of int Shape of folded input tensor
def get_folded_output_shape(ind=0)Get shape of folded (parallelized) output tensor.
ind : int Output index
tuple of int Shape of folded output tensor
def get_normal_input_shape(ind=0)Get normal (non-folded) input shape.
ind : int Input index (0=activations, 1=weights)
tuple of int Normal input shape
def get_normal_output_shape(ind=0)Get normal (non-folded) output shape.
ind : int Output index
tuple of int Normal output shape
def calc_wmem()Calculates and returns WMEM.
def calc_tmem()Calculates and returns TMEM.
def uram_estimation()Estimate UltraRAM (URAM) resource usage.
int Estimated number of URAMs needed
def bram_estimation()Calculates resource estimation for BRAM based on:
- FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
- M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O'Brien, Y. Umuroglu, M. Leeser and K. Vissers
-
- Sep 2018
def bram_efficiency_estimation()Estimate BRAM utilization efficiency.
float Efficiency ratio (actual bits used / total BRAM capacity allocated)
def uram_efficiency_estimation()Function for URAM efficiency estimation: actual parameter storage needed divided by the allocated URAM storage (from estimation)
def get_exp_cycles()Get expected number of clock cycles for one inference.
int Number of clock cycles
def minimize_accumulator_width(model)Minimize the accumulator bit width according to the weight values, input data types, and size of dot product
def minimize_weight_bit_width(model)Minimize the bit width based on the values of the weights.
def get_hw_compatible_threshold_tensor(orig_thres_matrix)Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:
- ensure MH % PE == 0
- for bipolar weights&inputs, ensure thresholds are positive
- interleave rows between PEs
- reshape into (PE, TMEM, n_thres_steps) and return
def get_hw_compatible_weight_tensor(orig_weight_matrix)Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:
- ensure MH % PE == 0 and MW % SIMD == 0
- for bipolar {-1,+1} weights, convert to binary {0, 1}
- interleave rows between PEs
- reshape into (1, PE, WMEM, SIMD) and return
def make_weight_file(weights, weight_file_mode, weight_file_name)Produce a file containing given weights in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.
Arguments:
- weights : numpy array with weights to be put into the file
- weight_file_mode : one of {hls_header, decoupled_verilog_dat, decoupled_runtime}
- weight_file_name : filename for the weight file to be generated
def generate_params(model, path)Generate parameter files (weights and thresholds) for hardware generation.
model : ModelWrapper FINN ModelWrapper containing this node path : str Output directory path for generated files
def get_op_and_param_counts()Get dictionary of operations and parameter counts for this layer.
dict Dictionary with operation types and counts as key-value pairs
def derive_characteristic_fxns(period)Derive characteristic performance functions for this node.
period : float Clock period in nanoseconds
def get_verilog_top_module_intf_names()Get Verilog top module interface names for this node.
dict Dictionary mapping interface types to port names
def code_generation_ipi()Generate TCL commands for IP integrator (IPI) block design.
list of str List of TCL commands for Vivado IP integrator
class OuterShuffle(HWCustomOp)Abstraction layer for HW OuterShuffle (rearrange and transpose) layers. Only permutations that do not effect the inner most dimensions are feasible
def get_exp_cycles()Estimate cycles by simulating the input_gen HLS pipeline.
Derives all parameters from transpose_in_shape, perm, and SIMD:
- output shape: apply perm to input shape
- loop coefficients: input strides permuted by perm
- buffer size: power-of-2 >= max_rp_retract + WP_DELAY + 2
The HLS pipeline has three stall sources:
- WP_DELAY (=4): write-pointer pipeline latency before reads begin
- Read stalls: consumer waits for data (rp >= wp_delayed)
- Write stalls: producer blocked by full buffer (wp - fp >= buf_size)
When buf_size > 262144 (URAM), pipeline II=3 due to read latency.
HWCustomOp for generic pooling operators MaxPool, AvgPool, AccPool and QuantAvgPool.
class Pool(HWCustomOp)Abstraction layer for HW implementation of Pool. Requires ConvolutionInputGenerator(depthwise == 1) to format its input
Input shape (BatchSize,OutImgDim,OutImgDim,TotalKernelSize*Channels) Output shape (BatchSize,OutImgDim,OutImgDim,Channels)
Notes:
-
The input shape was chosen to be compatible with im2col (only true when there is not folding).
-
The actual data layout produced by the hlslib kernels is different for depthwise ops.
-
depthwise SWG: (1, OFMDim, OFMDim, IFMChannels/PE, K, K, PE)
Channels can be folded using PE (SIMD from the input perspective)
def get_nodeattr_types()Get dictionary of custom node attributes with their types and default values.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_normal_input_shape(ind=0)Return shape of the input tensor.
def get_folded_input_shape(ind=0)Return shape of the folded input tensor.
def get_normal_output_shape(ind=0)Return shape of the output tensor.
def get_folded_output_shape(ind=0)Return shape of the folded output tensor.
def get_exp_cycles()Return estimation of expected cycles for set folding.
def get_instream_width(ind=0)Width of the input stream.
def get_outstream_width(ind=0)Width of the output stream.
def infer_node_datatype(model)Infers the datatype of the output from the node attribute.
def verify_node()Verifies the node configuration attributes.
def execute_node(context, graph)Executes the node with inputs from context writing outputs to context.
class Requant(HWCustomOp)Abstraction layer for HW implementation of Requantization.
Requantization computes: clip(round(x * scale + bias), min, max)
This is an alternative to Thresholding for cases where the thresholds are uniformly spaced. Instead of comparing against N thresholds, we compute the output directly using a multiply-add operation.
Inputs: input[0]: Data tensor to requantize input[1]: Scale tensor (per-channel or scalar, stored as initializer) input[2]: Bias tensor (per-channel or scalar, stored as initializer)
def get_scale(model)Get scale tensor from model initializer (input[1]).
def get_bias(model)Get bias tensor from model initializer (input[2]).
def is_per_channel(model)Check if scale/bias are per-channel (vs per-tensor).
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_normal_input_shape(ind=0)Returns input shape in format [N, H, W, C] or [N, C].
def get_normal_output_shape(ind=0)Returns output shape.
def get_folded_input_shape(ind=0)Returns folded input shape.
def get_folded_output_shape(ind=0)Returns folded output shape.
def get_exp_cycles()Returns expected number of cycles for execution.
def execute_node(context, graph)Execute the requant operation.
def get_instream_width(ind=0)Returns input stream width.
def get_outstream_width(ind=0)Returns output stream width.
Hardware operator corresponding to the standard ONNX Reshape.
@register_custom_op
class Reshape(HWCustomOp)Reshape operator, essentially passthrough with different input/output shape.
def get_nodeattr_types()Custom node attributes with their types and default values.
@property
def inp_shape()Input shape attribute.
@property
def out_shape()Output shape attribute.
@property
def dtype()Datatype attribute as QONNX DataType.
@property
def pe()Parallel elements in the last dimension of the output.
def get_input_datatype(ind=0)Datatype of the input tensor, same as the output.
def get_output_datatype(ind=0)Datatype of the output tensor, same as the input.
def get_normal_input_shape(ind=0)Regular input shape as seen by the ONNX standard.
def get_normal_output_shape(ind=0)Regular output shape as seen by the ONNX standard.
def get_folded_input_shape(ind=0)Shape of the folded (PE) input tensor
def get_folded_output_shape(ind=0)Shape of the folded (PE) output tensor
def get_instream_width(ind=0)Widths of the input data stream of the input at index ind
def get_outstream_width(ind=0)Widths of the output data stream of the output at index ind
def get_number_output_values()Expected output values for the operation given the folding.
def get_exp_cycles()Expected cycles for the operation given the folding.
def infer_node_datatype(model: ModelWrapper)Infers the datatype of the node output from the model graph.
def execute_node(context, graph)Execute reshape operation (Python fallback).
RTLBackend specializations of HWCustomOps.
def register_custom_op(cls)Registers a class into the custom_op dictionary
RTL implementation of ConvolutionInputGenerator (Sliding Window Generator).
This module provides an RTL-based implementation of the ConvolutionInputGenerator, generating sliding windows for convolution operations on FPGA. Supports non-square, 1D, strided, dilated, and depthwise convolutions with configurable buffer implementations.
class ConvolutionInputGenerator_rtl(ConvolutionInputGenerator, RTLBackend)Class that corresponds to finn-rtllib swg module. Generates an RTL ConvolutionInputGenerator implementation based on (System-)Verilog templates, defined in finn-rtllib/swg.
def __init__(onnx_node, **kwargs)Initialize the RTL ConvolutionInputGenerator.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications
def get_number_input_values()Function to get the number of expected input values.
def use_parallel_window_output()Check if parallel window output mode is enabled.
bool True if parallel window output is enabled, False otherwise
def get_buffer_depth()Return total depth of the internal buffer, depending on implementation style.
def get_exp_cycles()Get expected number of clock cycles for one inference.
int Number of clock cycles required for processing
def bram_estimation()Estimate Block RAM (BRAM) resource usage.
int Estimated number of BRAMs needed
def lut_estimation()Estimate LUT resource usage.
int Estimated number of LUTs needed
def uram_estimation()Estimate UltraRAM (URAM) resource usage.
int Estimated number of URAMs needed
def execute_node(context, graph)Execute this ConvolutionInputGenerator node.
Performs sliding window generation for convolution operations.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
def prepare_codegen_default()Fill code generation dict for the default implementation style by computing the incremental addressing scheme for the circular buffer.
def prepare_codegen_parallel()Fill code generation dict for the parallel implementation style by computing the loop controller configuration and partitioning the fixed buffer into shift-registers (for parallel read access) and line buffers (for efficient LUTRAM/BRAM/URAM implementation).
def select_impl_style()Select implementation style based on folding configuration.
def generate_hdl(model, fpgapart, clk)Generate HDL code and wrapper for the IP, depending on required implementation style.
def get_rtl_file_list(abspath=False)Get list of RTL files required for this node.
abspath : bool If True, return absolute file paths; otherwise return relative paths
list of str List of RTL file paths
def code_generation_ipi()Constructs and returns the TCL for node instantiation in Vivado IPI.
def get_verilog_top_module_intf_names()Return a dict of names of input and output interfaces. The keys reflect the protocols each interface implements: 'clk', 'rst', 'm_axis', 's_axis', 'aximm', 'axilite'. Values are lists of tuples (axis, aximm) or names (axilite): 'axis' tuples correspond to the list of node inputs in order, each tuple is (interface_name, interface_width_bits). axilite always assumed to be 32 bits and is not tuple (name only). Each block must have at most one aximm and one axilite.
def get_dynamic_config(ifm_dim=None, stride=None, dilation=None)Returns a configuration dict to re-configure FM dimension during runtime. Stride and dilation can also be changed. Certain restrictions apply (e.g. component must be synthesized for largest buffer size).
class ElementwiseBinary_rtl(ElementwiseBinaryOperation, RTLBackend)Base CustomOp wrapper for the finn-rtllib eltwisef component.
def adapt_for_loop_body(input_types)Adapt elementwise binary operator for loop body execution.
When an elementwise operator is placed inside a loop, parameters that are indexed per iteration (PARAMETER type) need to be received as streaming inputs rather than embedded constants. This method changes the lhs_style/rhs_style attributes from "const" to "input" as needed.
def get_verilog_top_module_intf_names()Return the interface names for the Verilog top module.
For RTL elementwise operations, this includes handling for MLO mode where the rhs parameter may be streamed as an input.
def code_generation_ipi()Constructs and returns the TCL for node instantiation in Vivado IPI.
class ElementwiseAdd_rtl(ElementwiseBinary_rtl,
elementwise_binary.ElementwiseAdd)RTL implementation of elementwise addition for FLOAT32.
class ElementwiseSub_rtl(ElementwiseBinary_rtl,
elementwise_binary.ElementwiseSub)RTL implementation of elementwise subtraction for FLOAT32.
class ElementwiseMul_rtl(ElementwiseBinary_rtl,
elementwise_binary.ElementwiseMul)RTL implementation of elementwise multiplication for FLOAT32.
class FINNLoop(HWCustomOp, RTLBackend)Class that corresponds to the meta/container node FINN loop which is a placeholder for a group of fpgadataflow nodes that have been separated out into a FINN-ONNX model of its own and are meant to be executed in a loop.
def get_nodeattr(name)Get a node attribute by name. Data is stored inside the ONNX node's AttributeProto container. Attribute must be part of get_nodeattr_types. Default value is returned if attribute is not set.
def set_nodeattr(name, value)Set a node attribute by name. Data is stored inside the ONNX node's AttributeProto container. Attribute must be part of get_nodeattr_types.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def prepare_rtlsim(behav=False)Creates a xsi emulation library for the RTL code generated for this node, sets the rtlsim_so attribute to its path.
def generate_hdl_stream_tap()Helper function to generate verilog code for stream tap components.
RTL implementation of FMPadding for feature map padding.
This module provides an RTL-based implementation of feature map padding using the finn-rtllib fmpadding_axi component. Supports runtime reconfiguration of padding amounts and spatial feature sizes via optional AXI-Lite interface.
class FMPadding_rtl(FMPadding, RTLBackend)CustomOp wrapper for the finn-rtllib fmpadding_axi component.
Supports adjusting the padding amount and spatial feature sizes at runtime.
def __init__(onnx_node, **kwargs) -> NoneInitialize the RTL FMPadding component.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications, including dynamic_mode for runtime reconfiguration
def get_verilog_top_module_intf_names()Get Verilog top module interface names.
dict Dictionary mapping interface types to interface names, including optional AXI-Lite interface if dynamic_mode is enabled
def get_template_values(ifm_dims, pads, chans, simd, idt)Calculate template parameter values for HDL generation.
ifm_dims : list Input feature map dimensions [H, W] pads : list Padding amounts [top, left, bottom, right] chans : int Number of channels simd : int SIMD parallelism factor idt : DataType Input data type
dict Dictionary of template substitution values for HDL generation
def get_dynamic_config(ifm_dims=None, pads=None)Return a configuration dict to re-configure FM dimension and padding amounts during runtime.
def generate_hdl(model, fpgapart, clk)Generate HDL code from templates for this node.
model : ModelWrapper ONNX model wrapper fpgapart : str Target FPGA part number clk : float Target clock frequency in ns
def get_rtl_file_list(abspath=False)Get list of RTL files required for this node.
abspath : bool If True, return absolute file paths; otherwise return relative paths
list of str List of RTL file paths (4 files: fmpadding_axi.sv, fmpadding.sv, axi2we.sv, generated .v file)
def code_generation_ipi()Construct and returns the TCL for node instantiation in Vivado IPI.
def execute_node(context, graph)Execute this FMPadding node.
Performs feature map padding using C++ or RTL simulation.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
def auto_size_simd(I_dim: int, SIMD: int) -> Optional[int]Return the smallest divisor d of I_dim such that d > SIMD. if no such divisor exists, return None.
class InnerShuffle_rtl(InnerShuffle, RTLBackend)CustomOp wrapper for the finn-rtllib inner_shuffle component.
def code_generation_ipi()Constructs and returns the TCL for node instantiation in Vivado IPI.
class LayerNorm_rtl(LayerNorm, RTLBackend)RTL backend implementation for LayerNorm kernel. Generates RTL code for hardware synthesis of LayerNorm operations.
RTL implementation of Matrix Vector Activation Unit (MVAU).
This module provides an RTL-based implementation of the Matrix Vector Activation Unit for FPGA acceleration, supporting features like double-pumped DSPs and various weight memory modes.
class MVAU_rtl(MVAU, RTLBackend)Class that corresponds to finn-rtl Matrix Vector Unit.
def __init__(onnx_node, **kwargs)Initialize the RTL Matrix Vector Activation Unit.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications, including pumpedCompute for double-pumped DSP operation
def execute_node(context, graph)Execute this MVAU node.
Performs matrix-vector multiplication with optional activation using C++ or RTL simulation.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
def lut_estimation()Estimate LUT resource usage.
int Estimated number of LUTs needed (currently returns 0)
def dsp_estimation(fpgapart)Estimate DSP resource usage based on target FPGA.
fpgapart : str Target FPGA part number
int Estimated number of DSP blocks needed
def instantiate_ip(cmd)Instantiate the RTL IP in Vivado IPI.
cmd : list List of TCL commands to which instantiation commands are appended
def generate_hdl(model, fpgapart, clk)Generate HDL code from templates for this node.
model : ModelWrapper ONNX model wrapper fpgapart : str Target FPGA part number clk : float Target clock frequency in ns
def prepare_codegen_default(fpgapart, clk)Prepare code generation dictionary for default implementation.
fpgapart : str Target FPGA part number clk : float Target clock frequency in ns
tuple of (str, dict) Template file path and code generation dictionary
def get_rtl_file_list(abspath=False)Get list of RTL files required for this node.
abspath : bool If True, return absolute file paths; otherwise return relative paths
list of str List of RTL file paths
def get_verilog_paths()Get list of Verilog include paths for this node.
list of str List of directory paths containing Verilog source files
class Requant_rtl(Requant, RTLBackend)RTL backend for Requant operation using finn-rtllib/requant.
def generate_hdl(model, fpgapart, clk)Generate RTL code for the requant operation.
def get_rtl_file_list(abspath=False)Return list of RTL files needed for this node.
def execute_node(context, graph)Execute the node, using RTL simulation if exec_mode is rtlsim.
RTLBackend specialization of the Reshape operator.
@register_custom_op
class Reshape_rtl(Reshape, RTLBackend)RTLBackend specialization of the Reshape operator
def get_nodeattr_types()Custom node attributes with their types and default values.
def execute_node(context, graph)Execute reshape operation (RTL simulation or Python fallback).
def generate_hdl(model, fpgapart, clk)Generate HLD code by filling in the verilog template.
def get_rtl_file_list(abspath: bool = False)Return list of RTL files required for this custom operation.
Arguments:
-
abspath- Whether to return absolute paths (default: False).
Returns:
List of paths pointing to required RTL files.
Raises:
-
FINNInternalError- If code_gen_dir_ipgen or gen_top_module attributes are invalid.
def code_generation_ipi()Code generation for IP integration.
RTL implementation of streaming data width converter.
This module provides an RTL-based implementation for converting between different stream data widths while maintaining throughput.
class StreamingDataWidthConverter_rtl(StreamingDataWidthConverter, RTLBackend)Class that corresponds to finn-rtllib datawidth converter module.
def get_nodeattr_types()Get the attribute types for this node.
def check_divisible_iowidths()Check that input and output widths are divisible.
Ensures that the stream width conversion has an integer ratio, which is required for proper operation.
bool True if widths are properly divisible, False otherwise
def execute_node(context, graph)Execute the node in the given context and graph for simulation.
def get_template_values()Get the code generation template values for this node.
def generate_hdl(model, fpgapart, clk)Generate the HDL code for this node.
def get_rtl_file_list(abspath=False)Get list of RTL files required for this node.
abspath : bool If True, return absolute file paths; otherwise return relative paths
list of str List of RTL file paths
def code_generation_ipi()Constructs and returns the TCL for node instantiation in Vivado IPI.
RTL implementation of streaming FIFO.
This module provides an RTL-based implementation of streaming FIFOs for buffering data between layers, with support for both RTL and Vivado IP implementations.
class StreamingFIFO_rtl(StreamingFIFO, RTLBackend)RTL implementation of streaming FIFO for data buffering.
def __init__(onnx_node, **kwargs)Initialize the RTL streaming FIFO.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications, including impl_style for choosing between RTL and Vivado implementations
def get_adjusted_depth()Get FIFO depth adjusted for implementation requirements.
For Vivado implementation, rounds up depth to nearest power-of-2.
int Adjusted FIFO depth
def get_verilog_top_module_intf_names()Get Verilog top module interface names for this node.
dict Dictionary mapping interface types to port names, including optional maxcount output for depth monitoring
def is_sim_fifo_gauge()Check if this FIFO should use simulation gauge implementation.
Returns True for RTL FIFOs with depth monitoring enabled, which use an infinite Verilog queue for simulation instead of Q_srl.
bool True if using simulation gauge, False otherwise
def generate_hdl(model, fpgapart, clk)Generate HDL code from templates for this node.
model : ModelWrapper ONNX model wrapper fpgapart : str Target FPGA part number clk : float Target clock frequency in ns
def code_generation_ipi()Generate TCL commands for instantiating this IP in Vivado IPI.
list of str List of TCL commands for IP instantiation
def get_rtl_file_list(abspath=False)Get list of RTL files required for this node.
abspath : bool If True, return absolute file paths; otherwise return relative paths
list of str List of RTL file paths
def prepare_rtlsim(behav=False)Prepare this node for RTL simulation.
NotImplementedError If impl_style is 'rtl' (not supported for simulation)
def execute_node(context, graph)Execute this FIFO node.
Performs buffering using Python simulation for cppsim mode or Vivado FIFOs, and RTL simulation for rtlsim mode with RTL-style FIFOs.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
RTL implementation of thresholding activation.
This module provides an RTL-based implementation of thresholding activations for quantization and activation functions in FPGA dataflow architectures.
class Thresholding_rtl(Thresholding, RTLBackend)Class that corresponds to finn-rtllib 'thresholding' function.
def __init__(onnx_node, **kwargs)Initialize the RTL thresholding activation node.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications, including memory depth triggers and optimization flags
def get_pe_mem_geometries()Return a list of (bitwidth, depth) for PE memory configurations to be used in resource estimation.
for each bitwidth, the depth is calculated as the number of thresholds that can be stored in a single memory block the bitwidth is the bitwidth of the threshold values the depth is the number of thresholds that can be stored in a single memory block the number of memory blocks is calculated as the number of thresholds divided by the depth the number of memory blocks is then multiplied by the number of PEs to get the total number of memory blocks required for the entire layer
def get_memory_estimate()Return the memory estimate for this node.
def bram_estimation()Return the number of BRAMs required for this node.
def uram_estimation()Return the number of URAMs required for this node.
def lut_estimation()Return the number of LUTs required for this node.
def get_all_meminit_filenames(abspath=False)Return a list of all .dat memory initializer files used for this node.
def prepare_codegen_rtl_values(model)All dictionary values produced in this function are to replace their key value(s) in the RTL template files.
def get_rtl_file_list(abspath=False)Thresholding binary search RTL file list.
def generate_hdl(model, fpgapart, clk)Prepare HDL files from templates for synthesis.
def execute_node(context, graph)Execute this thresholding node.
Performs threshold comparisons using C++ or RTL simulation.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
def code_generation_ipi()Construct and returns the TCL commands for node instantiation as an RTL block.
def get_verilog_top_module_intf_names()Get Verilog top module interface names for this node.
dict Dictionary mapping interface types to port names, including optional AXI-Lite interface for runtime weights
def generate_params(model, path)Generate threshold parameter files for RTL implementation.
model : ModelWrapper ONNX model wrapper containing threshold values path : str Directory path where parameter files will be generated
def make_weight_file(weights, weight_file_mode, weight_file_name)Produce a file containing given weights (thresholds) in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.
weights : numpy array Weights to be put into the file weight_file_mode : str Mode for the weight file ("decoupled_runtime" or "internal_embedded") weight_file_name : str Filename for the weight file to be generated
def minimize_weight_bit_width(model)Minimize threshold datatype, with RTL-specific adjustments.
The RTL implementation saturates inputs to the threshold datatype range when the threshold datatype is narrower than the input datatype. To ensure correct comparisons at saturation boundaries, the threshold datatype must be able to represent [min_threshold - 1 : max_threshold].
RTL implementation of Vector-Vector Activation Unit (VVAU).
This module provides an RTL-based implementation of the Vector-Vector Activation Unit for DSP-based computation of quantized neural network activations in FPGA dataflow architectures.
class VVAU_rtl(VVAU, RTLBackend)RTL implementation of Vector-Vector Activation Unit.
Implements DSP-based activation functions using vector-vector multiply-accumulate operations for efficient FPGA execution.
def __init__(onnx_node, **kwargs)Initialize the RTL Vector-Vector Activation Unit node.
onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class
def get_nodeattr_types()Get dictionary of attribute names and their types for this node.
dict Dictionary mapping attribute names to type specifications, combining VVAU and RTLBackend attributes
def execute_node(context, graph)Execute this VVAU node.
Performs vector-vector activation using C++ or RTL simulation.
context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node
def lut_estimation()Estimate LUT utilization for this VVAU node.
int LUT count estimate (always 0 for VVAU as it uses DSPs)
def dsp_estimation(fpgapart)Estimate DSP utilization for this VVAU node.
fpgapart : str Target FPGA part name
int Number of DSP blocks required (PE * ceil(SIMD/3))
def instantiate_ip(cmd)Add RTL IP instantiation commands to Vivado script.
cmd : list List of Vivado TCL commands to append to
def generate_hdl(model, fpgapart, clk)Generate HDL code for this VVAU node.
model : ModelWrapper ONNX model wrapper containing weights fpgapart : str Target FPGA part name clk : float Target clock period in nanoseconds
def prepare_codegen_default(fpgapart, clk)Prepare default code generation dictionary for HDL templates.
fpgapart : str Target FPGA part name clk : float Target clock period in nanoseconds
tuple (template_path, code_gen_dict) where template_path is the path to the Verilog wrapper template and code_gen_dict contains substitutions
def get_rtl_file_list(abspath=False)Get list of RTL files needed for this VVAU node.
abspath : bool, optional Whether to return absolute paths (default: False)
list List of RTL file paths required for synthesis
def get_verilog_paths() -> list[str]Get list of Verilog paths required for this node.
class RTLBackend(ABC)RTLBackend class all custom ops that correspond to a module in finn-rtllib are using functionality of. Contains different functions every RTL custom node should have. Some as abstract methods, these have to be filled when writing a new RTL custom op node.
def prepare_rtlsim(behav=False)Creates a xsi emulation library for the RTL code generated for this node, sets the rtlsim_so attribute to its path.
def get_verilog_paths()Returns path to code gen directory. Can be overwritten to return additional paths to relevant verilog files
@abstractmethod
def get_rtl_file_list(abspath=False)Returns list of rtl files. Needs to be filled by each node.
class Shuffle(HWCustomOp)Abstraction layer for Shuffle (rearrange and transpose) layers. This operator is later transformed into InnerShuffle and OuterShuffle operations.
def get_nodeattr_types()The attributes for the Shuffle node capture the optional reshapes either side of the transpose. Below is a diagram indicating what tensors the attribute names are referring to.
β in_shape
β
β
βββββββΌβββββββ β β β Reshape β β β βββββββ¬βββββββ β β transpose_in_shape βββββββΌβββββββ β β β Transpose β β β βββββββ¬βββββββ β transpose_out_shape βββββββΌβββββββ β β β Reshape β β β βββββββ¬βββββββ β β out_shape βΌ
def get_exp_cycles()Estimate cycles by decomposing into Inner/OuterShuffle stages.
Decomposes the transpose into a sequence of hardware-constrained operations (inner_shuffle / outer_shuffle), creates temporary nodes for each stage, and returns the MAX of their cycle estimates (stages are pipelined, so throughput is limited by the slowest).
class StreamingSplit(HWCustomOp)Abstraction layer for HW implementation of Split. Only supports splitting along the last (channel) axis.
FPGA dataflow custom operator for Squeeze operation.
@register_custom_op
class Squeeze(HWCustomOp)Hardware custom operator for Squeeze operation.
Removes single-dimension entries from the shape of a tensor.
def __init__(onnx_node, **kwargs) -> NoneInitialize the Squeeze operator from an ONNX node.
def get_nodeattr_types()Return the dictionary of node attributes for the Squeeze operator.
@property
def inp_dtype() -> BaseDataTypeReturn the input datatype.
@property
def out_dtype() -> BaseDataTypeReturn the output datatype.
@property
def inp_shape()Return the input shape.
@property
def out_shape()Return the output shape.
@property
def pe()Return the number of parallel processing elements (PE).
def make_shape_compatible_op(model: ModelWrapper) -> NodeProtoCreate a shape-compatible operation for ONNX shape inference.
Returns a standard ONNX Squeeze node for shape inference purposes.
def infer_node_datatype(model: ModelWrapper) -> NoneInfer and set the datatype of the node output.
def execute_node(context, graph) -> NoneExecute unsqueeze operation (Python fallback).
def verify_node()Verify the node attributes, inputs and outputs.
def get_input_datatype(ind=0) -> BaseDataTypeReturn the datatype of the input at the given index.
def get_output_datatype(ind=0) -> BaseDataTypeReturn the datatype of the output at the given index.
def get_normal_input_shape(ind=0)Return the unfolded input shape at the given index.
def get_normal_output_shape(ind=0)Return the unfolded output shape at the given index.
def get_folded_input_shape(ind=0)Return the folded input shape at the given index.
Applies PE-based folding to the last dimension.
def get_folded_output_shape(ind=0)Return the folded output shape at the given index.
Applies PE-based folding to the last dimension.
def get_instream_width(ind=0)Return the width of the input stream in bits at the given index.
def get_outstream_width(ind=0)Return the width of the output stream in bits at the given index.
def get_number_output_values()Return the number of expected output values from the operator.
def get_exp_cycles()Return the expected number of cycles for the squeeze operation.
class StreamingDataflowPartition(CustomOp)Class that corresponds to the meta/container node StreamingDataflowPartition which is a placeholder for a group of fpgadataflow nodes that have been separated out into a FINN-ONNX model of its own. Note that is does not produce any HLS or bitfile by itself.
class StreamingDataWidthConverter(HWCustomOp)Abstraction layer for HW implementation of StreamingDataWidthConverter
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output.
def lut_estimation()Calculates resource estimations for LUTs
class StreamingFIFO(HWCustomOp)def bram_estimation()Calculates resource estimation for BRAM
def uram_estimation()Calculates resource estimation for URAM
def lut_estimation()Calculates resource estimations for LUTs
Module that provides the Thresholding class,that implements multi-threshold activation functions. The thresholding operation compares input values against a set of thresholds to produce quantized outputs.
class Thresholding(HWCustomOp)Abstraction layer for HW implementation of Thresholding.
def __init__(onnx_node, **kwargs)Initialize the Thresholding node.
def get_nodeattr_types()Return a dictionary of attribute names and their types for this node.
Returns a dictionary describing node attributes including parallelization (PE), number of channels, data types, and runtime configuration options.
def infer_node_datatype(model)Infer and set the data types for node inputs and outputs.
Updates the inputDataType attribute based on the model's tensor datatype and sets the output tensor datatype based on the outputDataType attribute.
Arguments:
-
model- The ONNX model containing this node.
def verify_node()Verify that the node is configured correctly.
Checks that the backend attribute is set to 'fpgadataflow' and that all necessary attributes exist.
Returns:
List of informational messages about the node's configuration status.
def get_input_datatype(ind=0)Return FINN DataType of input.
def get_output_datatype(ind=0)Return FINN DataType of output.
def minimize_weight_bit_width(model)Minimize threshold datatype bitwidth based on actual threshold values. This function should not round or clip the threshold values, that is done in RoundAndClipThresholds.
def get_instream_width(ind=0)Return the width of the input stream in bits.
Arguments:
-
ind- Input index (0 for data input, 1 for threshold/weight input).
Returns:
Width of the input stream in bits.
def get_outstream_width(ind=0)Return the width of the output stream in bits.
Arguments:
-
ind- Output index (currently only supports index 0).
Returns:
Width of the output stream in bits.
def get_folded_input_shape(ind=0)Return the folded input shape for hardware implementation.
The folded shape accounts for parallelization (PE) and temporal memory (TMEM) organization used in the hardware accelerator.
Arguments:
-
ind- Input index (currently only supports index 0).
Returns:
Tuple representing the folded input shape.
def get_folded_output_shape(ind=0)Return the folded output shape for hardware implementation.
Arguments:
-
ind- Output index (currently only supports index 0).
Returns:
Tuple representing the folded output shape (same as folded input shape).
def get_normal_input_shape(ind=0)Return the normal (unfolded) input shape.
Arguments:
-
ind- Input index (currently only supports index 0).
Returns:
Tuple representing the normal input shape.
def get_normal_output_shape(ind=0)Return the normal (unfolded) output shape.
Arguments:
-
ind- Output index (currently only supports index 0).
Returns:
Tuple representing the normal output shape (same as normal input shape).
def get_exp_cycles()Return the expected number of execution cycles.
Calculates cycles as: Channels/PE * batch size * feature map dimensions.
Returns:
Expected number of cycles for execution.
def get_hw_compatible_threshold_tensor(orig_thres_matrix)Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:
- ensure MH % PE == 0
- for unsigned inputs, ensure thresholds are positive
- interleave rows between PEs
- reshape into (PE, TMEM, n_thres_steps) and return.
def execute_node(context, graph)Execute the thresholding operation.
Performs multi-threshold comparison on input values using the threshold tensor. Handles data layout transformations and applies output bias (ActVal) if configured. Converts output to bipolar format if the output data type is BIPOLAR.
Arguments:
-
context- Dictionary containing input values keyed by tensor names. -
graph- The ONNX graph containing this node.
def calc_tmem()Calculate and returns TMEM.
FPGA dataflow custom operator for Unsqueeze operation.
@register_custom_op
class Unsqueeze(HWCustomOp)Hardware custom operator for Unsqueeze operation.
Inserts single-dimension entries into the shape of a tensor.
def __init__(onnx_node, **kwargs) -> NoneInitialize the Unsqueeze operator from an ONNX node.
def get_nodeattr_types()Return the dictionary of node attributes for the Unsqueeze operator.
@property
def inp_dtype() -> BaseDataTypeReturn the input datatype.
@property
def out_dtype() -> BaseDataTypeReturn the output datatype.
@property
def inp_shape()Return the input shape.
@property
def out_shape()Return the output shape.
@property
def pe()Return the number of parallel processing elements (PE).
def make_shape_compatible_op(model: ModelWrapper) -> NodeProtoCreate a shape-compatible operation for ONNX shape inference.
Returns a standard ONNX Unsqueeze node for shape inference purposes.
def infer_node_datatype(model: ModelWrapper) -> NoneInfer and set the datatype of the node output.
def execute_node(context, graph) -> NoneExecute unsqueeze operation (Python fallback).
def verify_node()Verify the node attributes, inputs and outputs.
def get_input_datatype(ind=0) -> BaseDataTypeReturn the datatype of the input at the given index.
def get_output_datatype(ind=0) -> BaseDataTypeReturn the datatype of the output at the given index.
def get_normal_input_shape(ind=0)Return the unfolded input shape at the given index.
def get_normal_output_shape(ind=0)Return the unfolded output shape at the given index.
def get_folded_input_shape(ind=0)Return the folded input shape at the given index.
Applies PE-based folding to the last dimension.
def get_folded_output_shape(ind=0)Return the folded output shape at the given index.
Applies PE-based folding to the last dimension.
def get_instream_width(ind=0)Return the width of the input stream in bits at the given index.
def get_outstream_width(ind=0)Return the width of the output stream in bits at the given index.
def get_number_output_values()Return the number of expected output values from the operator.
def get_exp_cycles()Return the expected number of cycles for the unsqueeze operation.
class UpsampleNearestNeighbour(HWCustomOp)Abstraction layer for HW implementation of UpsampleNearestNeighbour.
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_output_datatype(ind=0)Returns FINN DataType of output. (Same as input datatype)
Vector-Vector Activation Unit (VVAU) implementation for FPGA dataflow.
This module contains the VVAU class which provides hardware abstraction for vector-vector activation layers in FPGA implementations. The VVAU performs convolutional operations with thresholding activation functions.
class VVAU(HWCustomOp)Abstraction layer for HW implementation of VectorVectorActivation layers.
def __init__(onnx_node, **kwargs)Initialize the VVAU (Vector-Vector Activation Unit) instance.
Arguments:
-
onnx_node- ONNX node representing the VVAU operation -
**kwargs- Additional keyword arguments passed to parent class
def get_nodeattr_types()Get the dictionary of node attribute types for VVAU.
Returns:
-
dict- Dictionary mapping attribute names to their types and constraints
def execute_node(context, graph)Execute the VVAU node operation.
Performs the vector-vector activation computation including matrix multiplication and optional thresholding activation.
Arguments:
-
context- Execution context containing input tensors -
graph- ONNX graph containing the node
def infer_node_datatype(model)Infer and set the node's data types based on the model.
Arguments:
-
model- FINN model containing the node
def get_input_datatype(ind=0)Returns FINN DataType of input.
def get_accumulator_datatype()Returns FINN DataType of accumulator
def get_output_datatype(ind=0)Returns FINN DataType of output.
def get_instream_width(ind=0)Get the input stream width for the specified input.
Arguments:
-
ind- Input index (0 for activations, 1 for weights, 2 for thresholds)
Returns:
-
int- Input stream width in bits
Raises:
-
Exception- If input index is out of range
def get_outstream_width(ind=0)Get the output stream width.
Arguments:
-
ind- Output index (default 0)
Returns:
-
int- Output stream width in bits
def get_folded_input_shape(ind=0)Get the folded input shape for hardware implementation.
Arguments:
-
ind- Input index (0 for activations, 1 for weights)
Returns:
-
tuple- Folded input shape dimensions
Raises:
-
Exception- If input index is undefined or requirements not met
def get_folded_output_shape(ind=0)Get the folded output shape for hardware implementation.
Arguments:
-
ind- Output index (default 0)
Returns:
-
tuple- Folded output shape dimensions
def get_normal_input_shape(ind=0)Get the normal (unfolded) input shape.
Arguments:
-
ind- Input index (default 0)
Returns:
-
tuple- Normal input shape dimensions
def get_normal_output_shape(ind=0)Get the normal (unfolded) output shape.
Arguments:
-
ind- Output index (default 0)
Returns:
-
tuple- Normal output shape dimensions
def calc_wmem()Calculates and returns WMEM.
def calc_tmem()Calculates and returns TMEM.
def uram_estimation()Estimate UltraRAM (URAM) usage for this layer.
Returns:
-
int- Number of URAMs required
def bram_estimation()Calculates resource estimation for BRAM
def bram_efficiency_estimation()Estimate BRAM efficiency (utilization) for this layer.
Returns:
-
float- BRAM efficiency ratio (actual usage / allocated capacity)
def uram_efficiency_estimation()Function for URAM efficiency estimation: actual parameter storage needed divided by the allocated URAM storage (from estimation)
def get_exp_cycles()Get the expected number of execution cycles for this layer.
Returns:
-
int- Expected number of clock cycles for execution
def minimize_accumulator_width(model)Minimize the accumulator bit width according to the weight values, input data types, and size of dot product
def minimize_weight_bit_width(model)Minimize the bit width based on the values of the weights.
def get_hw_compatible_threshold_tensor(orig_thres_matrix)Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:
- ensure MH % PE == 0
- for bipolar weights&inputs, ensure thresholds are positive
- interleave rows between PEs
- reshape into (PE, TMEM, n_thres_steps) and return
def get_hw_compatible_weight_tensor(orig_weight_matrix)Convert weight matrix to hardware-compatible format.
Arguments:
-
orig_weight_matrix- Original weight matrix
Returns:
-
numpy.ndarray- Hardware-compatible weight tensor
def make_weight_file(weights, weight_file_mode, weight_file_name)Produce a file containing given weights in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.
Arguments:
- weights : numpy array with weights to be put into the file
- weight_file_mode : one of {hls_header, decoupled_verilog_dat, decoupled_runtime}
- weight_file_name : filename for the weight file to be generated
def generate_params(model, path)Generate parameter files for hardware implementation.
Arguments:
-
model- FINN model containing the node -
path- Path to the code generation directory
def get_op_and_param_counts()Get operation and parameter counts for this layer.
Returns:
-
dict- Dictionary containing operation and parameter counts by type
def derive_characteristic_fxns(period)Derive characteristic functions for RTL simulation.
Arguments:
-
period- Clock period for simulation
def get_verilog_top_module_intf_names()Get Verilog top module interface names.
Returns:
-
dict- Dictionary mapping interface types to their names
def code_generation_ipi()Generate IP integrator (IPI) commands for hardware synthesis.
Returns:
-
list- List of TCL commands for IP integrator
Raises:
-
Exception- If unrecognized mem_mode is specified
π Navigation: β Back to API Documentation
This page was generated automatically from source code documentation.
π Home
- Migration Guide
- Building an Accelerator
- DataflowBuildConfig Documentation
- Example Models
- Build Guides:
- Brevitas - Quantization library
- FINN+ Repository
- Custom Steps Library