finn.custom_op

This page contains the complete API reference for all modules in the finn.custom_op package.

finn.custom_op.fpgadataflow.attention_heads
finn.custom_op.fpgadataflow.concat
finn.custom_op.fpgadataflow.convolutioninputgenerator
finn.custom_op.fpgadataflow.crop
finn.custom_op.fpgadataflow.duplicatestreams
finn.custom_op.fpgadataflow.elementwise_binary
finn.custom_op.fpgadataflow.fmpadding
finn.custom_op.fpgadataflow.fmpadding_pixel
finn.custom_op.fpgadataflow.globalaccpool
finn.custom_op.fpgadataflow.hls.attention_hls
finn.custom_op.fpgadataflow.hls.checksum_hls
finn.custom_op.fpgadataflow.hls.concat_hls
finn.custom_op.fpgadataflow.hls.duplicatestreams_hls
finn.custom_op.fpgadataflow.hls.elementwise_binary_hls
finn.custom_op.fpgadataflow.hls.globalaccpool_hls
finn.custom_op.fpgadataflow.hls.hwsoftmax_hls
finn.custom_op.fpgadataflow.hls.iodma_hls
finn.custom_op.fpgadataflow.hls.labelselect_hls
finn.custom_op.fpgadataflow.hls.layernorm_hls
finn.custom_op.fpgadataflow.hls.lookup_hls
finn.custom_op.fpgadataflow.hls.matrixvectoractivation_hls
finn.custom_op.fpgadataflow.hls.outer_shuffle_hls
finn.custom_op.fpgadataflow.hls.pool_hls
finn.custom_op.fpgadataflow.hls.requant_hls
finn.custom_op.fpgadataflow.hls.split_hls
finn.custom_op.fpgadataflow.hls.squeeze_hls
finn.custom_op.fpgadataflow.hls.streamingdatawidthconverter_hls
finn.custom_op.fpgadataflow.hls.streamingfifo_hls
finn.custom_op.fpgadataflow.hls.thresholding_hls
finn.custom_op.fpgadataflow.hls.tlastmarker_hls
finn.custom_op.fpgadataflow.hls.unsqueeze_hls
finn.custom_op.fpgadataflow.hls.upsampler_hls
finn.custom_op.fpgadataflow.hls.vectorvectoractivation_hls
finn.custom_op.fpgadataflow.hlsbackend
finn.custom_op.fpgadataflow.hwcustomop
finn.custom_op.fpgadataflow.hwsoftmax
finn.custom_op.fpgadataflow.inner_shuffle
finn.custom_op.fpgadataflow.labelselect
finn.custom_op.fpgadataflow.layernorm
finn.custom_op.fpgadataflow.lookup
finn.custom_op.fpgadataflow.matrixvectoractivation
finn.custom_op.fpgadataflow.outer_shuffle
finn.custom_op.fpgadataflow.pool
finn.custom_op.fpgadataflow.requant
finn.custom_op.fpgadataflow.reshape
finn.custom_op.fpgadataflow.rtl
finn.custom_op.fpgadataflow.rtl.convolutioninputgenerator_rtl
finn.custom_op.fpgadataflow.rtl.elementwise_binary_rtl
finn.custom_op.fpgadataflow.rtl.finn_loop
finn.custom_op.fpgadataflow.rtl.fmpadding_rtl
finn.custom_op.fpgadataflow.rtl.inner_shuffle_rtl
finn.custom_op.fpgadataflow.rtl.layernorm_rtl
finn.custom_op.fpgadataflow.rtl.matrixvectoractivation_rtl
finn.custom_op.fpgadataflow.rtl.requant_rtl
finn.custom_op.fpgadataflow.rtl.reshape_rtl
finn.custom_op.fpgadataflow.rtl.streamingdatawidthconverter_rtl
finn.custom_op.fpgadataflow.rtl.streamingfifo_rtl
finn.custom_op.fpgadataflow.rtl.thresholding_rtl
finn.custom_op.fpgadataflow.rtl.vectorvectoractivation_rtl
finn.custom_op.fpgadataflow.rtlbackend
finn.custom_op.fpgadataflow.shuffle
finn.custom_op.fpgadataflow.split
finn.custom_op.fpgadataflow.squeeze
finn.custom_op.fpgadataflow.streamingdataflowpartition
finn.custom_op.fpgadataflow.streamingdatawidthconverter
finn.custom_op.fpgadataflow.streamingfifo
finn.custom_op.fpgadataflow.thresholding
finn.custom_op.fpgadataflow.unsqueeze
finn.custom_op.fpgadataflow.upsampler
finn.custom_op.fpgadataflow.vectorvectoractivation

finn.custom_op.fpgadataflow.attention_heads

Multi-head attention split and merge operators.

SplitMultiHeads Objects

class SplitMultiHeads(HWCustomOp)

Split input tensor into multiple attention heads.

This operator splits the input tensor after input projections to create separate attention heads for multi-head attention mechanisms. The output can be either packed as a single tensor or split into multiple output tensors.

init

def __init__(onnx_node, **kwargs)

Initialize the SplitMultiHeads operator.

get_nodeattr_types

def get_nodeattr_types()

Get node attribute types for the SplitMultiHeads operator.

Defines the attributes that must be present on this node, including the number of attention heads, packing mode, data type, and other configuration parameters inherited from the parent HWCustomOp class.

Returns:

dict - Dictionary mapping attribute names to their type specifications

heads

@property
def heads()

Get number of attention heads.

packed

@property
def packed()

Get packed attribute.

dtype

@property
def dtype()

Get data type attribute.

num_elems

@property
def num_elems()

Get number of elements attribute.

num_inputs

@property
def num_inputs()

Get number of inputs attribute.

make_shape_compatible_op

def make_shape_compatible_op(model: ModelWrapper)

Make an operation compatible with the output shape for shape inference Note: Propagates shape forward, i.e., never asks for the shape of the output, even if it seems easier.

infer_node_datatype

def infer_node_datatype(model: ModelWrapper)

Infer the datatype of the node output.

execute_node

def execute_node(context, graph)

Execute the node.

verify_node

def verify_node()

Verify node attribute/input/output correctness.

get_input_datatype

def get_input_datatype(ind=0)

Get input data type.

get_output_datatype

def get_output_datatype(ind=0)

Get output data type.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Get normal input shape.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Get normal output shape.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Get folded input shape.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Get folded output shape.

get_instream_width

def get_instream_width(ind=0)

Get input stream width.

get_outstream_width

def get_outstream_width(ind=0)

Get output stream width.

get_number_output_values

def get_number_output_values()

Get the number of expected output values, i.e. how many times read() could/should be called on any output stream of this operator

get_exp_cycles

def get_exp_cycles()

Derive the expected cycles of the operator given the folding configuration.

MergeMultiHeads Objects

class MergeMultiHeads(HWCustomOp)

Merging of attention heads (before output projections) custom operator.

init

def __init__(onnx_node, **kwargs)

Initialize the operator.

get_nodeattr_types

def get_nodeattr_types()

Define attributes which must be present on this node

heads

@property
def heads()

Get number of attention heads.

packed

@property
def packed()

Get packed attribute.

dtype

@property
def dtype()

Get data type.

num_elems

@property
def num_elems()

Get number of elements.

num_inputs

@property
def num_inputs()

Get number of inputs.

squeezed

@property
def squeezed()

Get squeezed attribute.

make_shape_compatible_op

def make_shape_compatible_op(model: ModelWrapper)

Makes an operation compatible with the output shape for shape inference Note: Propagates shape forward, i.e., never asks for the shape of the output, even if it seems easier.

infer_node_datatype

def infer_node_datatype(model: ModelWrapper)

Infer the datatype of the node output.

execute_node

def execute_node(context, graph)

Executes multi-head slicing in simulation (either python c++ or rtl sim).

verify_node

def verify_node()

Verify node attribute/input/output correctness.

get_input_datatype

def get_input_datatype(ind=0)

Get input data type.

get_output_datatype

def get_output_datatype(ind=0)

Get output data type.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Get normal input shape.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Get normal output shape.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Get folded input shape.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Get folded output shape.

get_instream_width

def get_instream_width(ind=0)

Get input stream width.

get_outstream_width

def get_outstream_width(ind=0)

Get output stream width.

get_number_output_values

def get_number_output_values()

Gets the number of expected output values, i.e. how many times read() could/should be called on any output stream of this operator.

get_exp_cycles

def get_exp_cycles()

Derive the expected cycles of the operator given the folding configuration.

finn.custom_op.fpgadataflow.concat

StreamingConcat Objects

class StreamingConcat(HWCustomOp)

Abstraction layer for HW implementation of Concat. Only supports concatenating along the last (channel) axis.

finn.custom_op.fpgadataflow.convolutioninputgenerator

ConvolutionInputGenerator Objects

class ConvolutionInputGenerator(HWCustomOp)

Abstraction layer for HW implementation of ConvolutionInputGenerator

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_instream_width

def get_instream_width(ind=0)

Returns stream width, input and output stream width are equal for the sliding window function

finn.custom_op.fpgadataflow.crop

Crop Objects

class Crop(HWCustomOp)

Abstraction layer for Crop layers.

finn.custom_op.fpgadataflow.duplicatestreams

DuplicateStreams Objects

class DuplicateStreams(HWCustomOp)

Abstraction layer for HW implementation of DuplicateStreams

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_instream_width

def get_instream_width(ind=0)

Returns input stream width.

get_outstream_width

def get_outstream_width(ind=0)

Returns output stream width.

finn.custom_op.fpgadataflow.elementwise_binary

ElementwiseBinaryOperation Objects

class ElementwiseBinaryOperation(HWCustomOp)

calc_wmem

def calc_wmem()

Calculates and returns WMEM.

finn.custom_op.fpgadataflow.fmpadding

FMPadding Objects

class FMPadding(HWCustomOp)

Abstraction layer for HW impplementation of FMPadding. Pads input image by given amount.

get_padded_odim

def get_padded_odim()

Return the padded spatial size of the output.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output. (Same as input datatype)

finn.custom_op.fpgadataflow.fmpadding_pixel

FMPadding_Pixel Objects

class FMPadding_Pixel(HWCustomOp)

get_padded_odim

def get_padded_odim()

Return the padded spatial size of the output.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output. (Same as input datatype)

finn.custom_op.fpgadataflow.globalaccpool

GlobalAccPool Objects

class GlobalAccPool(HWCustomOp)

Abstraction layer for HW implementation of GlobalAccPool

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_instream_width

def get_instream_width(ind=0)

Returns input stream width.

get_outstream_width

def get_outstream_width(ind=0)

Returns output stream width.

finn.custom_op.fpgadataflow.hls.attention_hls

HLS backend implementation of scaled dot-product attention operator.

ScaledDotProductAttention_hls Objects

class ScaledDotProductAttention_hls(  # noqa: Class name does not follow
        # CapWords convention
        ScaledDotProductAttention,
        HLSBackend)

HLS Backend specialization of the Scaled Dot-product Attention Operator.

get_nodeattr_types

def get_nodeattr_types()

Return node attributes matching the HLS operator.

get_ap_int_max_w

def get_ap_int_max_w()

Return the maximum width of any ap_int used in this operator.

Calculates the maximum bit width required across all inputs, outputs, and optional mask elements for use in determining the widest ap_int type needed for HLS synthesis.

global_includes

def global_includes()

Generate list of C++ includes to be placed at the top of the generated code.

Adds necessary header files for the attention operator HLS implementation, including FINN HLSLIB activation functions and the attention-specific HLS implementation header.

generate_params

def generate_params(model: ModelWrapper, path)

Generate C++ parameters file including activation function thresholds.

Creates parameter files including activation function thresholds and other configuration parameters needed for HLS synthesis. The code generation directory is specified as an argument to work for both RTL and C++ simulation modes.

defines

def defines(var)

Generate C++ code of type alias, global constant and macro definitions.

docompute

def docompute()

Generate C++ code for calling the computation part of the operator.

Creates the main HLS computation call with proper RAM style directives for thresholds and mask storage, along with necessary pragmas for threshold arrays and storage binding.

blackboxfunction

def blackboxfunction()

Generate the head of the C++ function from which the IP block will be generated.

Creates the function signature describing the top level interface of the attention operator for HLS synthesis (ipgen).

pragmas

def pragmas()

Generate C++ pragmas to be inserted into the main function.

Creates HLS interface directives specifying how to create RTL ports for the top-level function arguments in both C++ simulation and ipgen-blackboxfunction.

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Return the names of input and output interfaces grouped by protocol.

Collects interface names in a dictionary organized by protocol type (clock, reset, AXI stream, etc.) for Verilog module generation.

finn.custom_op.fpgadataflow.hls.checksum_hls

CheckSum_hls Objects

class CheckSum_hls(HWCustomOp, HLSBackend)

Class that corresponds to custom_hls checksum function.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

finn.custom_op.fpgadataflow.hls.concat_hls

StreamingConcat_hls Objects

class StreamingConcat_hls(StreamingConcat, HLSBackend)

Streaming concatenation node with dynamically generated HLS. Only supports concatenating along the last axis.

finn.custom_op.fpgadataflow.hls.duplicatestreams_hls

DuplicateStreams_hls Objects

class DuplicateStreams_hls(DuplicateStreams, HLSBackend)

Class that corresponds to finn-hlslib function of the same name.

finn.custom_op.fpgadataflow.hls.elementwise_binary_hls

HLS backend implementation for elementwise binary operations.

This module provides HLS (High-Level Synthesis) implementations of elementwise binary operations with support for various memory modes, broadcasting, and parallel execution.

ElementwiseBinaryOperation_hls Objects

class ElementwiseBinaryOperation_hls(  # CapWords convention
        ElementwiseBinaryOperation, HLSBackend)

HLS backend implementation of elementwise binary operations.

Supports various binary operations (add, subtract, multiply, etc.) with configurable memory modes, broadcasting, and parallel execution units (PEs).

get_nodeattr_types

def get_nodeattr_types()

Get node attribute types for this operator.

Returns

dict Dictionary of node attribute names and their types.

get_ap_int_max_w

def get_ap_int_max_w()

Get maximum ap_int width used in this operator.

Returns

int Maximum bit width of any ap_int used in the operator.

adapt_for_loop_body

def adapt_for_loop_body(input_types)

Adapt elementwise binary operator for loop body execution.

When an elementwise operator is placed inside a loop, parameters that are indexed per iteration (PARAMETER type) need to be received as streaming inputs rather than embedded constants. This method changes the lhs_style/rhs_style attributes from "const" to "input" as needed.

code_generation_ipgen

def code_generation_ipgen(model, fpgapart, clk) -> None

Generate c++ code and tcl script for ip generation.

global_includes

def global_includes() -> None

Generate list of C++ includes for the top of generated code.

generate_params

def generate_params(model: ModelWrapper, path: str) -> None

Generate C++ parameters file for constant initializer inputs.

Parameters

model : ModelWrapper The ONNX model wrapper. path : str Path to the code generation directory.

defines

def defines(var) -> None

Generate C++ type aliases, global constants and macro definitions.

Parameters

var : str Variable name (currently unused).

read_npy_data

def read_npy_data() -> None

Generate C++ code for reading data from .npy files for C++ simulation testing.

strm_decl

def strm_decl() -> None

Generate C++ code for declaring all streams involved in C++ simulation testing.

docompute

def docompute() -> None

Generate C++ code for the computation part of the operator.

dataoutstrm

def dataoutstrm() -> None

Generate C++ code for reading output stream and converting to numpy format.

save_as_npy

def save_as_npy() -> None

Generate C++ code for saving simulation output to numpy format.

Notes:

This is currently empty in all HLSBackends. Functionality is now integrated into dataoutstrm().

blackboxfunction

def blackboxfunction() -> None

Generate C++ function head for the IP block (used during synthesis).

pragmas

def pragmas() -> None

Generate C++ HLS pragmas for simulation and synthesis.

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names() -> dict[str, list[str]]

Get the names of input and output interfaces grouped by protocol.

Returns

dict Dictionary mapping protocol types to interface names.

code_generation_ipi

def code_generation_ipi()

Generate IPI (IP Integrator) code for Vivado block design integration.

Returns

list List of TCL commands for IP integration.

execute_node

def execute_node(context, graph) -> None

Execute this node in the given context.

Parameters

context : dict Execution context mapping tensor names to numpy arrays. graph : onnx.GraphProto The ONNX graph containing this node.

ElementwiseAdd_hls Objects

@register_custom_op
class ElementwiseAdd_hls(ElementwiseBinaryOperation_hls,
                         elementwise_binary.ElementwiseAdd)

HLS implementation of elementwise addition operation.

ElementwiseSub_hls Objects

@register_custom_op
class ElementwiseSub_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseSub)

HLS implementation of elementwise subtraction operation.

ElementwiseMul_hls Objects

@register_custom_op
class ElementwiseMul_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseMul)

HLS implementation of elementwise multiplication operation.

ElementwiseDiv_hls Objects

@register_custom_op
class ElementwiseDiv_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseDiv)

HLS implementation of elementwise division operation.

ElementwiseAnd_hls Objects

@register_custom_op
class ElementwiseAnd_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseAnd)

HLS implementation of elementwise logical AND operation.

ElementwiseOr_hls Objects

@register_custom_op
class ElementwiseOr_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseOr)

HLS implementation of elementwise logical OR operation.

ElementwiseXor_hls Objects

@register_custom_op
class ElementwiseXor_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseXor)

HLS implementation of elementwise logical XOR operation.

ElementwiseEqual_hls Objects

@register_custom_op
class ElementwiseEqual_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseEqual)

HLS implementation of elementwise equality comparison operation.

ElementwiseLess_hls Objects

@register_custom_op
class ElementwiseLess_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseLess)

HLS implementation of elementwise less-than comparison operation.

ElementwiseLessOrEqual_hls Objects

@register_custom_op
class ElementwiseLessOrEqual_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls,
        elementwise_binary.ElementwiseLessOrEqual)

HLS implementation of elementwise less-than-or-equal comparison operation.

ElementwiseGreater_hls Objects

@register_custom_op
class ElementwiseGreater_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseGreater)

HLS implementation of elementwise greater-than comparison operation.

ElementwiseGreaterOrEqual_hls Objects

@register_custom_op
class ElementwiseGreaterOrEqual_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls,
        elementwise_binary.ElementwiseGreaterOrEqual)

HLS implementation of elementwise greater-than-or-equal comparison operation.

ElementwiseBitwiseAnd_hls Objects

@register_custom_op
class ElementwiseBitwiseAnd_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls,
        elementwise_binary.ElementwiseBitwiseAnd)

HLS implementation of elementwise bitwise AND operation.

ElementwiseBitwiseOr_hls Objects

@register_custom_op
class ElementwiseBitwiseOr_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls,
        elementwise_binary.ElementwiseBitwiseOr)

HLS implementation of elementwise bitwise OR operation.

ElementwiseBitwiseXor_hls Objects

@register_custom_op
class ElementwiseBitwiseXor_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls,
        elementwise_binary.ElementwiseBitwiseXor)

HLS implementation of elementwise bitwise XOR operation.

ElementwiseBitShift_hls Objects

@register_custom_op
class ElementwiseBitShift_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls,
        elementwise_binary.ElementwiseBitShift)

HLS implementation of elementwise bit shift operation.

Supports both left and right bit shift operations based on the 'direction' attribute.

get_nodeattr_types

def get_nodeattr_types() -> dict

Get node attribute types for bit shift operation.

Returns

dict Dictionary of node attribute names and their types, including the direction attribute for selecting shift direction.

ElementwiseMax_hls Objects

@register_custom_op
class ElementwiseMax_hls(  # CapWords convention
        ElementwiseBinaryOperation_hls, elementwise_binary.ElementwiseMax)

HLS Implementation of the elementwise max operation.

finn.custom_op.fpgadataflow.hls.globalaccpool_hls

GlobalAccPool_hls Objects

class GlobalAccPool_hls(GlobalAccPool, HLSBackend)

Class that corresponds to finn-hlslib AccPool_Batch function.

finn.custom_op.fpgadataflow.hls.hwsoftmax_hls

HWSoftmax_hls Objects

class HWSoftmax_hls(HWSoftmax, HLSBackend)

timeout_value

def timeout_value()

Set timeout value for HLS functions defined for one clock cycle

finn.custom_op.fpgadataflow.hls.iodma_hls

IODMA_hls Objects

class IODMA_hls(HWCustomOp, HLSBackend)

Class that corresponds to finn-hlslib DMA function(s).

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output. (Same as input datatype)

get_ap_int_max_w

def get_ap_int_max_w()

Return the maximum width of any ap_int used in this module.

finn.custom_op.fpgadataflow.hls.labelselect_hls

LabelSelect_hls Objects

class LabelSelect_hls(LabelSelect, HLSBackend)

Class that corresponds to finn-hlslib LabelSelect_Batch function.

finn.custom_op.fpgadataflow.hls.layernorm_hls

LayerNorm_hls Objects

class LayerNorm_hls(LayerNorm, HLSBackend)

timeout_value

def timeout_value()

Set timeout value for HLS functions defined for one clock cycle

finn.custom_op.fpgadataflow.hls.lookup_hls

Lookup_hls Objects

class Lookup_hls(Lookup, HLSBackend)

Streaming elementwise HLS lookup, mapping indices to values.

finn.custom_op.fpgadataflow.hls.matrixvectoractivation_hls

MVAU_hls Objects

class MVAU_hls(MVAU, HLSBackend)

Corresponds to finn-hlslib MatrixVectorActivation_Batch function.

lut_estimation

def lut_estimation()

Calculates resource estimations for LUTs based on:

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O'Brien, Y. Umuroglu, M. Leeser and K. Vissers
1. Sep 2018

code_generation_ipgen

def code_generation_ipgen(model, fpgapart, clk)

Generates c++ code and tcl script for ip generation.

get_template_param_values

def get_template_param_values()

Returns the template parameter values according to input, output and weight data types.

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize weight and threshold datatypes, with HLS-specific adjustments.

The HLS implementation uses the threshold datatype for comparisons. When the threshold datatype is narrower than the accumulator datatype, accumulator values get truncated, which can cause incorrect results. To prevent this, ensure threshold datatype is at least as wide as accumulator datatype.

finn.custom_op.fpgadataflow.hls.outer_shuffle_hls

auto_size_simd

def auto_size_simd(I_dim: int, SIMD: int) -> Optional[int]

Return the smallest divisor d of I_dim such that d > SIMD. if no such divisor exists, return None.

OuterShuffle_hls Objects

class OuterShuffle_hls(OuterShuffle, HLSBackend)

timeout_value

def timeout_value()

Set timeout value for HLS functions defined for one clock cycle

finn.custom_op.fpgadataflow.hls.pool_hls

HLSBackend specialization for generic pooling operators: MaxPool, AvgPool, AccPool and QuantAvgPool.

Pool_hls Objects

class Pool_hls(Pool, HLSBackend)

Class that corresponds to finn-hlslib Pool_batch function. Requires ConvolutionInputGenerator(depthwise == 1) to format its input

Input shape (BatchSize,OutImgDim,OutImgDim,TotalKernelSize*Channels) Output shape (BatchSize,OutImgDim,OutImgDim,Channels)

Notes:

The input shape was chosen to be compatible with im2col (only true when there is not folding).
The actual data layout produced by the hlslib kernels is different for depthwise ops.
depthwise SWG: (1, OFMDim, OFMDim, IFMChannels/PE, K, K, PE)

Channels can be folded using PE (SIMD from the input perspective)

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of custom node attributes with their types and default values.

global_includes

def global_includes()

List include directives for generated HLS code.

defines

def defines(var)

Constant and type definitions for generated HLS code.

docompute

def docompute()

Generates the computational part of the HLS C++ code.

pragmas

def pragmas()

Generate HLS pragmas to apply to the HLS C++ coede.

blackboxfunction

def blackboxfunction()

Blackbox function interface from which the IP will be generated.

execute_node

def execute_node(context, graph)

Execute the node in HLS C++ simulation.

finn.custom_op.fpgadataflow.hls.requant_hls

Requant_hls Objects

class Requant_hls(Requant, HLSBackend)

HLS backend for Requant operation.

Computes: clip(round(x * scale + bias), min, max)

Scale and bias are embedded as constants in the generated HLS code. Per-channel scale and bias are supported. When scale=1 and bias=0, the generated code skips the unnecessary multiply/add operations.

Note: This backend is primarily for FLOAT32 inputs. For integer inputs, prefer Requant_rtl which is more efficient.

get_exp_cycles

def get_exp_cycles()

Returns expected number of cycles for execution.

Adds a constant offset to account for HLS pipeline initialization overhead.

generate_params

def generate_params(model, path)

Generate scale and bias parameter arrays as HLS header file.

read_npy_data

def read_npy_data()

Generate code for reading input data from .npy file.

strm_decl

def strm_decl()

Generate stream declarations for C++ simulation.

dataoutstrm

def dataoutstrm()

Generate code for writing output data to .npy file.

save_as_npy

def save_as_npy()

Empty - output saving is handled in dataoutstrm.

execute_node

def execute_node(context, graph)

Execute the node using cppsim or rtlsim.

Custom implementation that only passes input 0 (data), since scale and bias are embedded as parameters in the generated code.

finn.custom_op.fpgadataflow.hls.split_hls

StreamingSplit_hls Objects

class StreamingSplit_hls(StreamingSplit, HLSBackend)

Streaming split node with dynamically generated HLS. Only supports splitting along the last axis.

finn.custom_op.fpgadataflow.hls.squeeze_hls

HLS backend implementation of the Squeeze operator.

Squeeze_hls Objects

@register_custom_op
class Squeeze_hls(Squeeze, HLSBackend)

HLS backend implementation of the Squeeze operator.

Removes single-dimension entries from the shape of a tensor using HLS synthesis.

get_nodeattr_types

def get_nodeattr_types()

Return the dictionary of node attributes for the HLS Squeeze operator.

global_includes

def global_includes() -> None

Generate list of C++ includes for the top of the generated code.

defines

def defines(var) -> None

Generate C++ code for type alias, global constant, and macro definitions.

execute_node

def execute_node(context, graph) -> None

Execute node via generic HLSBackend implementation (cppsim/rtlsim).

docompute

def docompute() -> None

Generate C++ code for the computation part of the operator.

blackboxfunction

def blackboxfunction() -> None

Generate the C++ function signature for the IP block generation.

finn.custom_op.fpgadataflow.hls.streamingdatawidthconverter_hls

StreamingDataWidthConverter_hls Objects

class StreamingDataWidthConverter_hls(StreamingDataWidthConverter, HLSBackend)

Class that corresponds to finn-hlslib StreamingDataWidthConverter_Batch function.

finn.custom_op.fpgadataflow.hls.streamingfifo_hls

StreamingFIFO_hls Objects

class StreamingFIFO_hls(StreamingFIFO, HLSBackend)

HLS-based FIFO implementation. Currently only used as virtual FIFO for live FIFO-sizing.

finn.custom_op.fpgadataflow.hls.thresholding_hls

HLS backend implementation for neural network thresholding operations.

This module provides the Thresholding_hls class which implements hardware-accelerated thresholding/activation functions using High-Level Synthesis (HLS) for FPGA deployment. Supports multiple memory modes, runtime weight loading, and various data types.

Thresholding_hls Objects

class Thresholding_hls(Thresholding, HLSBackend)

Class that corresponds to finn-hls Thresholding_Batch function.

init

def __init__(onnx_node: NodeProto, **kwargs: Any) -> None

Initialize the Thresholding_hls layer.

Parameters

onnx_node : NodeProto ONNX node representing this operation **kwargs : dict Additional arguments passed to parent classes

get_nodeattr_types

def get_nodeattr_types() -> dict[str, tuple[str, bool, str, set[str]]
                                 | tuple[str, bool, int, set[int]]]

Get the types and default values for node attributes.

Returns

dict Dictionary mapping attribute names to their type specifications

bram_estimation

def bram_estimation() -> int

Calculate BRAM cost if resource set to BRAM.

Returns

int Number of BRAM blocks required.

lut_estimation

def lut_estimation() -> int

Calculate LUT cost, taking memory resource type into account.

Returns

int Number of LUTs required for comparators and optional LUTRAM.

get_ap_int_max_w

def get_ap_int_max_w() -> int

Get the maximum ap_int width used in this layer.

Returns

int Maximum bitwidth of any ap_int used in the operator.

code_generation_ipgen

def code_generation_ipgen(model: ModelWrapper, fpgapart: str,
                          clk: float) -> None

Generate C++ code and tcl script for IP generation.

Parameters

model : ModelWrapper The ONNX model wrapper. fpgapart : str Target FPGA part name. clk : float Clock period in nanoseconds.

get_template_param_values

def get_template_param_values() -> dict[str, str]

Return the template parameter values according to input, output and weight data types.

Returns

dict[str, str] Dictionary of template parameter names to their values.

make_weight_file

def make_weight_file(weights: np.ndarray, weight_file_mode: str,
                     weight_file_name: str) -> None

Produce a file containing given weights (thresholds) in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.

Parameters

weights : np.ndarray Numpy array with weights to be put into the file. weight_file_mode : str One of {hls_header, decoupled_verilog_dat, decoupled_runtime, decoupled_npy}. weight_file_name : str Filename for the weight file to be generated.

generate_params

def generate_params(model: ModelWrapper, path: str) -> None

Generate parameter files for thresholds.

Parameters

model : ModelWrapper The ONNX model wrapper containing initializers. path : str Code generation directory path.

execute_node

def execute_node(context: dict, graph: object) -> None

Execute this node in the given context.

Parameters

context : dict Execution context containing input/output tensors. graph : onnx.GraphProto The ONNX graph (unused but required by interface).

global_includes

def global_includes() -> None

Generate list of global C++ includes.

defines

def defines(var: object) -> None

Generate C++ defines for template parameters.

Parameters

var : object Unused parameter for compatibility with base class.

read_npy_data

def read_npy_data() -> None

Generate C++ code for reading NPY data for simulation.

strm_decl

def strm_decl() -> None

Generate C++ stream declarations.

docompute

def docompute() -> None

Generate C++ code for the main computation.

dataoutstrm

def dataoutstrm() -> None

Generate C++ code for writing output data stream.

blackboxfunction

def blackboxfunction() -> None

Generate C++ black box function signature.

pragmas

def pragmas() -> None

Generate HLS pragmas for synthesis.

code_generation_ipi

def code_generation_ipi() -> list[str]

Generate Vivado IPI tcl commands for block design integration.

Returns

list[str] List of tcl commands for IPI block design generation.

get_op_and_param_counts

def get_op_and_param_counts() -> dict[str, int]

Get operation and parameter counts for this layer.

Returns

dict[str, int] Dictionary mapping parameter type to count.

ipgen_extra_directives

def ipgen_extra_directives() -> list[str]

Return a list of extra tcl directives for HLS synthesis.

Returns

list[str] List of tcl directives for HLS IP generation.

derive_characteristic_fxns

def derive_characteristic_fxns(period: int,
                               override_rtlsim_dict: dict | None = None
                               ) -> None

Derive characteristic functions for performance estimation.

Parameters

period : int Clock period in nanoseconds override_rtlsim_dict : dict | None Optional dictionary to override RTL simulation parameters.

Returns

None

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize threshold datatype, with HLS-specific adjustments.

The HLS implementation uses the threshold datatype for comparisons. When the threshold datatype is narrower than the input datatype, input values get truncated, which can cause incorrect results. To prevent this, ensure threshold datatype is at least as wide as input datatype.

finn.custom_op.fpgadataflow.hls.tlastmarker_hls

TLastMarker_hls Objects

class TLastMarker_hls(HWCustomOp, HLSBackend)

Node that adds/removes AXI stream TLAST signals where needed. Its behavior is transparent in node-by-node execution, only visible in IP-stitched rtlsim or actual hardware. This node may be needed at the end of the network to signal a DMA write (needed by the FINN PYNQ shell) or at the beginning to remove the end-of-burst from DMA read.

finn.custom_op.fpgadataflow.hls.unsqueeze_hls

HLS backend implementation of the Unsqueeze operator.

Unsqueeze_hls Objects

@register_custom_op
class Unsqueeze_hls(Unsqueeze, HLSBackend)

HLS backend implementation of the Unsqueeze operator.

Inserts single-dimension entries into the shape of a tensor using HLS synthesis.

get_nodeattr_types

def get_nodeattr_types()

Return the dictionary of node attributes for the HLS Unsqueeze operator.

global_includes

def global_includes() -> None

Generate list of C++ includes for the top of the generated code.

defines

def defines(var) -> None

Generate C++ code for type alias, global constant, and macro definitions.

execute_node

def execute_node(context, graph) -> None

Execute node via generic HLSBackend implementation (cppsim/rtlsim).

docompute

def docompute() -> None

Generate C++ code for the computation part of the operator.

blackboxfunction

def blackboxfunction() -> None

Generate the C++ function signature for the IP block generation.

finn.custom_op.fpgadataflow.hls.upsampler_hls

UpsampleNearestNeighbour_hls Objects

class UpsampleNearestNeighbour_hls(UpsampleNearestNeighbour, HLSBackend)

Corresponds to finn-hlslib UpsampleNearestNeighbour function. Upsampling is done with the Nearest Neighbour algorithm. The layer expects square feature maps for the in and output.

finn.custom_op.fpgadataflow.hls.vectorvectoractivation_hls

VVAU_hls Objects

class VVAU_hls(VVAU, HLSBackend)

Corresponds to finn-hlslib Vector_Vector_Activate_Batch function

lut_estimation

def lut_estimation()

Calculates resource estimations for LUTs based on:

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O'Brien, Y. Umuroglu, M. Leeser and K. Vissers
1. Sep 2018

code_generation_ipgen

def code_generation_ipgen(model, fpgapart, clk)

Generates c++ code and tcl script for ip generation.

get_template_param_values

def get_template_param_values()

Returns the template parameter values according to input, output and weight data types.

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize weight and threshold datatypes, with HLS-specific adjustments.

The HLS implementation uses the threshold datatype for comparisons. When the threshold datatype is narrower than the accumulator datatype, accumulator values get truncated, which can cause incorrect results. To prevent this, ensure threshold datatype is at least as wide as accumulator datatype.

finn.custom_op.fpgadataflow.hlsbackend

HLS backend implementation for FINN custom operations.

HLSBackend Objects

class HLSBackend(ABC)

HLSBackend class all custom ops that correspond to a finn-hlslib function are using functionality of. Contains different functions every HLS custom node should have. Some as abstract methods, these have to be filled when writing a new HLS custom op node.

get_nodeattr_types

def get_nodeattr_types()

Return dictionary of node attribute types and properties.

get_all_verilog_paths

def get_all_verilog_paths()

Return list of all folders containing Verilog code for this node.

get_all_verilog_filenames

def get_all_verilog_filenames(abspath=False)

Return list of all Verilog files used for this node.

prepare_rtlsim

def prepare_rtlsim(behav=False)

Creates a xsi emulation library for the RTL code generated for this node, sets the rtlsim_so attribute to its path.

code_generation_ipgen

def code_generation_ipgen(model, fpgapart, clk)

Generate C++ code and TCL script for IP generation.

ipgen_default_directives

def ipgen_default_directives()

Return list of default HLS synthesis directives.

ipgen_extra_directives

def ipgen_extra_directives()

Return a list of extra TCL directives for HLS synthesis.

ipgen_singlenode_code

def ipgen_singlenode_code(fpgapart=None)

Build the bash script for IP generation using the CallHLS utility.

code_generation_cppsim

def code_generation_cppsim(model)

Generate C++ code for simulation (cppsim).

code_generation_ipi

def code_generation_ipi()

Construct and return the TCL for node instantiation in Vivado IPI.

compile_singlenode_code

def compile_singlenode_code()

Build bash script for compilation using CppBuilder and execute to produce executable.

npy_to_dynamic_output

def npy_to_dynamic_output(context)

Read output.npy file generated from cppsim and place into context dictionary.

exec_precompiled_singlenode_model

def exec_precompiled_singlenode_model()

Execute precompiled executable.

hls_sname

def hls_sname()

Get the naming convention used by Vitis HLS for stream signals Example: the TDATA for a stream called "out" would be out_V_TDATA.

execute_node

def execute_node(context, graph)

Execute node in specified mode (cppsim or rtlsim).

global_includes

@abstractmethod
def global_includes()

Function to set the global includes for c++ code that has to be generated for cppsim or rtlsim, is member function of HLSBackend class but has to be filled by every node.

defines

@abstractmethod
def defines(var)

Function to set the define commands for c++ code that has to be generated for cppsim or rtlsim, is member function of HLSBackend class but has to be filled by every node.

var: makes it possible to reuse the function for different c++ code generation. I.e. if set to "ipgen" in MatrixVectorActivation additional PRAGMA defines are added.

read_npy_data

def read_npy_data()

Generate commands for reading data from .npy file in C++. Might need to be overwritten depending on CustomOp.

strm_decl

def strm_decl()

Generate commands for stream declaration in C++. Might need to be overwritten depending on CustomOp.

docompute

@abstractmethod
def docompute()

Function to generate the commands for the computational part of the c++ code, is member function of HLSBackend class but has to be filled by every node.

dataoutstrm

def dataoutstrm()

Generate commands for reading out data from C++ and converting to npy format. Might need to be overwritten depending on CustomOp.

save_as_npy

def save_as_npy()

Generate commands for saving data in .npy file in C++.

blackboxfunction

@abstractmethod
def blackboxfunction()

Function to generate a blackbock function in c++ from which an IP block will be generated, is member function of HLSBackend class but has to be filled by every node.

pragmas

def pragmas()

Generate pragma commands in C++. Might need to be overwritten depending on CustomOp.

get_ap_int_max_w

def get_ap_int_max_w()

Return the maximum width of any ap_int used in this module. Used to set the AP_INT_MAX_W definition for HLS.

timeout_value

def timeout_value()

Set timeout value for HLS functions defined for one clock cycle.

timeout_condition

def timeout_condition()

Set timeout condition for HLS functions defined for one clock cycle.

timeout_read_stream

def timeout_read_stream()

Set reading output stream procedure for HLS functions defined for one clock cycle.

finn.custom_op.fpgadataflow.hwcustomop

Base class for hardware custom operations in FINN dataflow architecture.

This module provides the HWCustomOp base class for custom operations that can be implemented using HLS or RTL backends in FPGA dataflow architectures.

HWCustomOp Objects

class HWCustomOp(CustomOp)

HWCustomOp class all custom ops that can be implemented with either HLS or RTL backend are based on. Contains different functions every fpgadataflow custom node should have. Some as abstract methods, these have to be filled when writing a new fpgadataflow custom op node.

init

def __init__(onnx_node: NodeProto, **kwargs: Any) -> None

Initialize HWCustomOp with an ONNX node.

Arguments:

onnx_node - The ONNX node to wrap.
**kwargs - Additional keyword arguments passed to parent class.

get_nodeattr_types

def get_nodeattr_types() -> dict[
    str,
    tuple[str, bool, int | float | str | bool | npt.NDArray | list]
    | tuple[str, bool, int | float | str | bool | npt.NDArray | list, set
            | None],
]

Return node attribute types for HWCustomOp.

Returns:

Dictionary mapping attribute names to their type specifications.

make_shape_compatible_op

def make_shape_compatible_op(model: "ModelWrapper") -> NodeProto

Make a shape compatible operation.

Arguments:

model - The model wrapper containing this node.

Returns:

The ONNX node for the shape compatible operation.

get_verilog_top_module_name

def get_verilog_top_module_name() -> str

Return the Verilog top module name for this node.

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names(
) -> dict[str, list[tuple[str, int]] | list[str]]

Return a dict of names of input and output interfaces. The keys reflect the protocols each interface implements: 'clk', 'rst', 'm_axis', 's_axis', 'aximm', 'axilite'. Values are lists of tuples (axis, aximm) or names (axilite): 'axis' tuples correspond to the list of node inputs in order, each tuple is (interface_name, interface_width_bits). axilite always assumed to be 32 bits and is not tuple (name only). Each block must have at most one aximm and one axilite.

get_rtlsim

def get_rtlsim() -> SimEngine

Return a xsi wrapper for the emulation library for this node.

close_rtlsim

def close_rtlsim(sim: SimEngine) -> None

Close and free up resources for rtlsim.

Arguments:

sim - The RTL simulation object to close.

node_res_estimation

def node_res_estimation(fpgapart: str) -> dict[str, int | float]

Return summarized resource estimation of BRAMs and LUTs of the node as a dictionary.

bram_efficiency_estimation

def bram_efficiency_estimation() -> float

Estimate BRAM efficiency.

Returns actual parameter storage needed divided by the allocated BRAM storage (from estimation).

uram_efficiency_estimation

def uram_efficiency_estimation() -> float

Estimate URAM efficiency.

Returns actual parameter storage needed divided by the allocated URAM storage (from estimation).

bram_estimation

def bram_estimation() -> int

Estimate BRAM resource usage.

Member function of HWCustomOp class that must be implemented by every node.

uram_estimation

def uram_estimation() -> int

Estimate UltraRAM resource usage.

Member function of HWCustomOp class that must be implemented by every node.

lut_estimation

def lut_estimation() -> int

Estimate LUT resource usage.

Member function of HWCustomOp class that must be implemented by every node.

dsp_estimation

def dsp_estimation(fpgapart: str) -> int

Estimate DSP resource usage.

Member function of HWCustomOp class that must be implemented by every node.

Arguments:

fpgapart - Target FPGA part string.

get_exp_cycles

def get_exp_cycles() -> int

Estimate expected cycles for set folding.

Member function of HWCustomOp class that must be implemented by every node.

get_op_and_param_counts

def get_op_and_param_counts() -> dict[str, int]

Return a dictionary with number of ops needed per inference.

Returns number of ops needed per inference for this layer as well as parameter count (weights, thresholds, etc.). Entries should be in the format: {op_ : , param_: }.

reset_rtlsim

def reset_rtlsim(sim: SimEngine) -> None

Set reset input in finnxsi to zero, toggle the clock and set it back to one.

rtlsim_multi_io

def rtlsim_multi_io(sim: SimEngine,
                    io_dict: dict[str, Any],
                    sname: str = "_V") -> None

Run rtlsim for this node, supports multiple i/o streams.

verify_node

def verify_node() -> None

Can be implemented to verify that all attributes the node needs are there and that particular attributes are set correctly. Can also check if the number of inputs is equal to the expected number.

generate_params

def generate_params(model: "ModelWrapper", path: str) -> None

Generate parameters (i.e. weights and thresholds).

Member function of HWCustomOp class that must be implemented by every node that needs to generate parameters.

Arguments:

model - The model wrapper containing this node.
path - Path where parameters should be generated.

get_number_output_values

def get_number_output_values() -> int

Get the number of expected output values.

Member function of HWCustomOp class that must be implemented by every node.

get_input_datatype

@abstractmethod
def get_input_datatype(ind: int = 0) -> BaseDataType

Return FINN DataType of input stream ind.

get_output_datatype

@abstractmethod
def get_output_datatype(ind: int = 0) -> BaseDataType

Return FINN DataType of output stream ind.

get_normal_input_shape

@abstractmethod
def get_normal_input_shape(
        ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]

Return normal input shape if implemented.

get_normal_output_shape

@abstractmethod
def get_normal_output_shape(
        ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]

Return folded output shape if implemented.

get_folded_input_shape

@abstractmethod
def get_folded_input_shape(
        ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]

Return folded input shape (according to synapse folding), if implemented.

get_folded_output_shape

@abstractmethod
def get_folded_output_shape(
        ind: int = 0) -> Sequence[int] | npt.NDArray[np.int_]

Return folded output shape (according to neuron folding), if implemented.

get_instream_width

@abstractmethod
def get_instream_width(ind: int = 0) -> int

Return input stream width, if implemented.

get_outstream_width

@abstractmethod
def get_outstream_width(ind: int = 0) -> int

Return output stream width, if implemented.

get_instream_width_padded

def get_instream_width_padded(ind: int = 0) -> int

Return input stream width padded to a multiple of 8.

This is required by the AXI Stream spec.

Arguments:

ind - Input index (default: 0).

get_outstream_width_padded

def get_outstream_width_padded(ind: int = 0) -> int

Return output stream width padded to a multiple of 8.

This is required by the AXI Stream spec.

Arguments:

ind - Output index (default: 0).

calc_tmem

def calc_tmem() -> int

Calculate and returns the TMEM.

calc_wmem

def calc_wmem() -> int

Calculate and returns the WMEM.

generate_hdl_memstream

def generate_hdl_memstream(fpgapart, pumped_memory=0)

Helper function to generate verilog code for memstream component. Currently utilized by MVAU, VVAU and HLS Thresholding layer.

generate_hdl_fetch_weights

def generate_hdl_fetch_weights(fpgapart)

Helper function to generate verilog code for fetch_weights component. Currently utilized by MVAU.

generate_hdl_dynload

def generate_hdl_dynload() -> None

Generate HDL for dynamic load wrapper.

derive_characteristic_fxns

def derive_characteristic_fxns(period: int,
                               override_rtlsim_dict: dict | None = None,
                               pre_hook=None) -> None

Return the unconstrained characteristic functions for this node.

Arguments:

period - The characterization period.
override_rtlsim_dict - Optional dictionary to override rtlsim settings.

Raises:

ValueError - If period is too short to characterize the node.

adapt_for_loop_body

def adapt_for_loop_body(input_types)

Called by LoopRolling transformation to allow operators to adapt their attributes when being placed inside a loop body.

This base implementation does nothing. Operators that need to modify their behavior when placed in loops should override this method.

Arguments:

input_types - List of LoopBodyInputType values for each input, indicating whether inputs are ACTIVATION, CONSTANT, PARAMETER, etc.

Example:

If an operator has a parameter that becomes a streamed input in a loop context (PARAMETER type), it might need to change an attribute like rhs_style from "const" to "input".

finn.custom_op.fpgadataflow.hwsoftmax

HWSoftmax Objects

class HWSoftmax(HWCustomOp)

Abstraction layer for HW implementation of SoftMax layers.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

finn.custom_op.fpgadataflow.inner_shuffle

InnerShuffle Objects

class InnerShuffle(HWCustomOp)

Abstraction layer for the Parallel 2D transpose.

get_exp_cycles

def get_exp_cycles()

Estimate cycles for the double-buffered InnerShuffle RTL.

The RTL uses two BRAM banks with page_size = I*J/SIMD. The first page must be fully written before reads can begin, adding one extra page of latency beyond the streaming throughput. Empirically verified to match cycles_rtlsim within atol=10.

finn.custom_op.fpgadataflow.labelselect

LabelSelect Objects

class LabelSelect(HWCustomOp)

Abstraction layer for HW implementation of LabelSelect

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_instream_width

def get_instream_width(ind=0)

Returns input stream width.

get_outstream_width

def get_outstream_width(ind=0)

Returns output stream width.

finn.custom_op.fpgadataflow.layernorm

LayerNorm Objects

class LayerNorm(HWCustomOp)

Abstraction layer for HW implementation of the LayerNorm layer.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

finn.custom_op.fpgadataflow.lookup

Lookup Objects

class Lookup(HWCustomOp)

Abstraction layer for HW implementation of streaming elementwise lookup, mapping indices to values.

finn.custom_op.fpgadataflow.matrixvectoractivation

Matrix-Vector-Activation Unit (MVAU) hardware implementation.

This module implements the MVAU operation for FPGA deployment, which performs matrix-vector multiplication optionally followed by activation/thresholding. Supports various memory modes, parallelization strategies, and quantized datatypes.

MVAU Objects

class MVAU(HWCustomOp)

Abstraction layer for HW implementation of MatrixVectorActivation layers.

init

def __init__(onnx_node, **kwargs)

Initialize the MVAU custom operation.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications

execute_node

def execute_node(context, graph)

Execute this MVAU node.

Performs matrix-vector multiplication and optional activation/thresholding.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

verify_node

def verify_node()

Verify that this node has valid attributes and configuration.

Returns

list of str List of verification messages/warnings

infer_node_datatype

def infer_node_datatype(model)

Infer and set output datatype based on input datatype and node attributes.

Parameters

model : ModelWrapper FINN ModelWrapper containing this node

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_accumulator_datatype

def get_accumulator_datatype()

Returns FINN DataType of accumulator

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_instream_width

def get_instream_width(ind=0)

Get width of input stream in bits.

Parameters

ind : int Input stream index (0=activations, 1=weights, 2=thresholds)

Returns

int Bit width of the specified input stream

get_outstream_width

def get_outstream_width(ind=0)

Get width of output stream in bits.

Parameters

ind : int Output stream index

Returns

int Bit width of the output stream

get_folded_input_shape

def get_folded_input_shape(ind=0)

Get shape of folded (parallelized) input tensor.

Parameters

ind : int Input index (0=activations, 1=weights)

Returns

tuple of int Shape of folded input tensor

get_folded_output_shape

def get_folded_output_shape(ind=0)

Get shape of folded (parallelized) output tensor.

Parameters

ind : int Output index

Returns

tuple of int Shape of folded output tensor

get_normal_input_shape

def get_normal_input_shape(ind=0)

Get normal (non-folded) input shape.

Parameters

ind : int Input index (0=activations, 1=weights)

Returns

tuple of int Normal input shape

get_normal_output_shape

def get_normal_output_shape(ind=0)

Get normal (non-folded) output shape.

Parameters

ind : int Output index

Returns

tuple of int Normal output shape

calc_wmem

def calc_wmem()

Calculates and returns WMEM.

calc_tmem

def calc_tmem()

Calculates and returns TMEM.

uram_estimation

def uram_estimation()

Estimate UltraRAM (URAM) resource usage.

Returns

int Estimated number of URAMs needed

bram_estimation

def bram_estimation()

Calculates resource estimation for BRAM based on:

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
M. Blott, T. B. Preusser, N. J. Fraser, G. Gambardella, K. O'Brien, Y. Umuroglu, M. Leeser and K. Vissers
1. Sep 2018

bram_efficiency_estimation

def bram_efficiency_estimation()

Estimate BRAM utilization efficiency.

Returns

float Efficiency ratio (actual bits used / total BRAM capacity allocated)

uram_efficiency_estimation

def uram_efficiency_estimation()

Function for URAM efficiency estimation: actual parameter storage needed divided by the allocated URAM storage (from estimation)

get_exp_cycles

def get_exp_cycles()

Get expected number of clock cycles for one inference.

Returns

int Number of clock cycles

minimize_accumulator_width

def minimize_accumulator_width(model)

Minimize the accumulator bit width according to the weight values, input data types, and size of dot product

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize the bit width based on the values of the weights.

get_hw_compatible_threshold_tensor

def get_hw_compatible_threshold_tensor(orig_thres_matrix)

Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:

ensure MH % PE == 0
for bipolar weights&inputs, ensure thresholds are positive
interleave rows between PEs
reshape into (PE, TMEM, n_thres_steps) and return

get_hw_compatible_weight_tensor

def get_hw_compatible_weight_tensor(orig_weight_matrix)

Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:

ensure MH % PE == 0 and MW % SIMD == 0
for bipolar {-1,+1} weights, convert to binary {0, 1}
interleave rows between PEs
reshape into (1, PE, WMEM, SIMD) and return

make_weight_file

def make_weight_file(weights, weight_file_mode, weight_file_name)

Produce a file containing given weights in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.

Arguments:

weights : numpy array with weights to be put into the file
weight_file_mode : one of {hls_header, decoupled_verilog_dat, decoupled_runtime}
weight_file_name : filename for the weight file to be generated

generate_params

def generate_params(model, path)

Generate parameter files (weights and thresholds) for hardware generation.

Parameters

model : ModelWrapper FINN ModelWrapper containing this node path : str Output directory path for generated files

get_op_and_param_counts

def get_op_and_param_counts()

Get dictionary of operations and parameter counts for this layer.

Returns

dict Dictionary with operation types and counts as key-value pairs

derive_characteristic_fxns

def derive_characteristic_fxns(period)

Derive characteristic performance functions for this node.

Parameters

period : float Clock period in nanoseconds

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Get Verilog top module interface names for this node.

Returns

dict Dictionary mapping interface types to port names

code_generation_ipi

def code_generation_ipi()

Generate TCL commands for IP integrator (IPI) block design.

Returns

list of str List of TCL commands for Vivado IP integrator

finn.custom_op.fpgadataflow.outer_shuffle

OuterShuffle Objects

class OuterShuffle(HWCustomOp)

Abstraction layer for HW OuterShuffle (rearrange and transpose) layers. Only permutations that do not effect the inner most dimensions are feasible

get_exp_cycles

def get_exp_cycles()

Estimate cycles by simulating the input_gen HLS pipeline.

Derives all parameters from transpose_in_shape, perm, and SIMD:

output shape: apply perm to input shape
loop coefficients: input strides permuted by perm
buffer size: power-of-2 >= max_rp_retract + WP_DELAY + 2

The HLS pipeline has three stall sources:

WP_DELAY (=4): write-pointer pipeline latency before reads begin
Read stalls: consumer waits for data (rp >= wp_delayed)
Write stalls: producer blocked by full buffer (wp - fp >= buf_size)

When buf_size > 262144 (URAM), pipeline II=3 due to read latency.

finn.custom_op.fpgadataflow.pool

HWCustomOp for generic pooling operators MaxPool, AvgPool, AccPool and QuantAvgPool.

Pool Objects

class Pool(HWCustomOp)

Abstraction layer for HW implementation of Pool. Requires ConvolutionInputGenerator(depthwise == 1) to format its input

Input shape (BatchSize,OutImgDim,OutImgDim,TotalKernelSize*Channels) Output shape (BatchSize,OutImgDim,OutImgDim,Channels)

Notes:

The input shape was chosen to be compatible with im2col (only true when there is not folding).
The actual data layout produced by the hlslib kernels is different for depthwise ops.
depthwise SWG: (1, OFMDim, OFMDim, IFMChannels/PE, K, K, PE)

Channels can be folded using PE (SIMD from the input perspective)

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of custom node attributes with their types and default values.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Return shape of the input tensor.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Return shape of the folded input tensor.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Return shape of the output tensor.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Return shape of the folded output tensor.

get_exp_cycles

def get_exp_cycles()

Return estimation of expected cycles for set folding.

get_instream_width

def get_instream_width(ind=0)

Width of the input stream.

get_outstream_width

def get_outstream_width(ind=0)

Width of the output stream.

infer_node_datatype

def infer_node_datatype(model)

Infers the datatype of the output from the node attribute.

verify_node

def verify_node()

Verifies the node configuration attributes.

execute_node

def execute_node(context, graph)

Executes the node with inputs from context writing outputs to context.

finn.custom_op.fpgadataflow.requant

Requant Objects

class Requant(HWCustomOp)

Abstraction layer for HW implementation of Requantization.

Requantization computes: clip(round(x * scale + bias), min, max)

This is an alternative to Thresholding for cases where the thresholds are uniformly spaced. Instead of comparing against N thresholds, we compute the output directly using a multiply-add operation.

Inputs: input[0]: Data tensor to requantize input[1]: Scale tensor (per-channel or scalar, stored as initializer) input[2]: Bias tensor (per-channel or scalar, stored as initializer)

get_scale

def get_scale(model)

Get scale tensor from model initializer (input[1]).

get_bias

def get_bias(model)

Get bias tensor from model initializer (input[2]).

is_per_channel

def is_per_channel(model)

Check if scale/bias are per-channel (vs per-tensor).

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Returns input shape in format [N, H, W, C] or [N, C].

get_normal_output_shape

def get_normal_output_shape(ind=0)

Returns output shape.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Returns folded input shape.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Returns folded output shape.

get_exp_cycles

def get_exp_cycles()

Returns expected number of cycles for execution.

execute_node

def execute_node(context, graph)

Execute the requant operation.

get_instream_width

def get_instream_width(ind=0)

Returns input stream width.

get_outstream_width

def get_outstream_width(ind=0)

Returns output stream width.

finn.custom_op.fpgadataflow.reshape

Hardware operator corresponding to the standard ONNX Reshape.

Reshape Objects

@register_custom_op
class Reshape(HWCustomOp)

Reshape operator, essentially passthrough with different input/output shape.

get_nodeattr_types

def get_nodeattr_types()

Custom node attributes with their types and default values.

inp_shape

@property
def inp_shape()

Input shape attribute.

out_shape

@property
def out_shape()

Output shape attribute.

dtype

@property
def dtype()

Datatype attribute as QONNX DataType.

pe

@property
def pe()

Parallel elements in the last dimension of the output.

get_input_datatype

def get_input_datatype(ind=0)

Datatype of the input tensor, same as the output.

get_output_datatype

def get_output_datatype(ind=0)

Datatype of the output tensor, same as the input.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Regular input shape as seen by the ONNX standard.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Regular output shape as seen by the ONNX standard.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Shape of the folded (PE) input tensor

get_folded_output_shape

def get_folded_output_shape(ind=0)

Shape of the folded (PE) output tensor

get_instream_width

def get_instream_width(ind=0)

Widths of the input data stream of the input at index ind

get_outstream_width

def get_outstream_width(ind=0)

Widths of the output data stream of the output at index ind

get_number_output_values

def get_number_output_values()

Expected output values for the operation given the folding.

get_exp_cycles

def get_exp_cycles()

Expected cycles for the operation given the folding.

infer_node_datatype

def infer_node_datatype(model: ModelWrapper)

Infers the datatype of the node output from the model graph.

execute_node

def execute_node(context, graph)

Execute reshape operation (Python fallback).

finn.custom_op.fpgadataflow.rtl

RTLBackend specializations of HWCustomOps.

register_custom_op

def register_custom_op(cls)

Registers a class into the custom_op dictionary

finn.custom_op.fpgadataflow.rtl.convolutioninputgenerator_rtl

RTL implementation of ConvolutionInputGenerator (Sliding Window Generator).

This module provides an RTL-based implementation of the ConvolutionInputGenerator, generating sliding windows for convolution operations on FPGA. Supports non-square, 1D, strided, dilated, and depthwise convolutions with configurable buffer implementations.

ConvolutionInputGenerator_rtl Objects

class ConvolutionInputGenerator_rtl(ConvolutionInputGenerator, RTLBackend)

Class that corresponds to finn-rtllib swg module. Generates an RTL ConvolutionInputGenerator implementation based on (System-)Verilog templates, defined in finn-rtllib/swg.

init

def __init__(onnx_node, **kwargs)

Initialize the RTL ConvolutionInputGenerator.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications

get_number_input_values

def get_number_input_values()

Function to get the number of expected input values.

use_parallel_window_output

def use_parallel_window_output()

Check if parallel window output mode is enabled.

Returns

bool True if parallel window output is enabled, False otherwise

get_buffer_depth

def get_buffer_depth()

Return total depth of the internal buffer, depending on implementation style.

get_exp_cycles

def get_exp_cycles()

Get expected number of clock cycles for one inference.

Returns

int Number of clock cycles required for processing

bram_estimation

def bram_estimation()

Estimate Block RAM (BRAM) resource usage.

Returns

int Estimated number of BRAMs needed

lut_estimation

def lut_estimation()

Estimate LUT resource usage.

Returns

int Estimated number of LUTs needed

uram_estimation

def uram_estimation()

Estimate UltraRAM (URAM) resource usage.

Returns

int Estimated number of URAMs needed

execute_node

def execute_node(context, graph)

Execute this ConvolutionInputGenerator node.

Performs sliding window generation for convolution operations.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

prepare_codegen_default

def prepare_codegen_default()

Fill code generation dict for the default implementation style by computing the incremental addressing scheme for the circular buffer.

prepare_codegen_parallel

def prepare_codegen_parallel()

Fill code generation dict for the parallel implementation style by computing the loop controller configuration and partitioning the fixed buffer into shift-registers (for parallel read access) and line buffers (for efficient LUTRAM/BRAM/URAM implementation).

select_impl_style

def select_impl_style()

Select implementation style based on folding configuration.

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate HDL code and wrapper for the IP, depending on required implementation style.

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Get list of RTL files required for this node.

Parameters

abspath : bool If True, return absolute file paths; otherwise return relative paths

Returns

list of str List of RTL file paths

code_generation_ipi

def code_generation_ipi()

Constructs and returns the TCL for node instantiation in Vivado IPI.

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Return a dict of names of input and output interfaces. The keys reflect the protocols each interface implements: 'clk', 'rst', 'm_axis', 's_axis', 'aximm', 'axilite'. Values are lists of tuples (axis, aximm) or names (axilite): 'axis' tuples correspond to the list of node inputs in order, each tuple is (interface_name, interface_width_bits). axilite always assumed to be 32 bits and is not tuple (name only). Each block must have at most one aximm and one axilite.

get_dynamic_config

def get_dynamic_config(ifm_dim=None, stride=None, dilation=None)

Returns a configuration dict to re-configure FM dimension during runtime. Stride and dilation can also be changed. Certain restrictions apply (e.g. component must be synthesized for largest buffer size).

finn.custom_op.fpgadataflow.rtl.elementwise_binary_rtl

ElementwiseBinary_rtl Objects

class ElementwiseBinary_rtl(ElementwiseBinaryOperation, RTLBackend)

Base CustomOp wrapper for the finn-rtllib eltwisef component.

adapt_for_loop_body

def adapt_for_loop_body(input_types)

Adapt elementwise binary operator for loop body execution.

When an elementwise operator is placed inside a loop, parameters that are indexed per iteration (PARAMETER type) need to be received as streaming inputs rather than embedded constants. This method changes the lhs_style/rhs_style attributes from "const" to "input" as needed.

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Return the interface names for the Verilog top module.

For RTL elementwise operations, this includes handling for MLO mode where the rhs parameter may be streamed as an input.

code_generation_ipi

def code_generation_ipi()

Constructs and returns the TCL for node instantiation in Vivado IPI.

ElementwiseAdd_rtl Objects

class ElementwiseAdd_rtl(ElementwiseBinary_rtl,
                         elementwise_binary.ElementwiseAdd)

RTL implementation of elementwise addition for FLOAT32.

ElementwiseSub_rtl Objects

class ElementwiseSub_rtl(ElementwiseBinary_rtl,
                         elementwise_binary.ElementwiseSub)

RTL implementation of elementwise subtraction for FLOAT32.

ElementwiseMul_rtl Objects

class ElementwiseMul_rtl(ElementwiseBinary_rtl,
                         elementwise_binary.ElementwiseMul)

RTL implementation of elementwise multiplication for FLOAT32.

finn.custom_op.fpgadataflow.rtl.finn_loop

FINNLoop Objects

class FINNLoop(HWCustomOp, RTLBackend)

Class that corresponds to the meta/container node FINN loop which is a placeholder for a group of fpgadataflow nodes that have been separated out into a FINN-ONNX model of its own and are meant to be executed in a loop.

get_nodeattr

def get_nodeattr(name)

Get a node attribute by name. Data is stored inside the ONNX node's AttributeProto container. Attribute must be part of get_nodeattr_types. Default value is returned if attribute is not set.

set_nodeattr

def set_nodeattr(name, value)

Set a node attribute by name. Data is stored inside the ONNX node's AttributeProto container. Attribute must be part of get_nodeattr_types.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

prepare_rtlsim

def prepare_rtlsim(behav=False)

Creates a xsi emulation library for the RTL code generated for this node, sets the rtlsim_so attribute to its path.

generate_hdl_stream_tap

def generate_hdl_stream_tap()

Helper function to generate verilog code for stream tap components.

finn.custom_op.fpgadataflow.rtl.fmpadding_rtl

RTL implementation of FMPadding for feature map padding.

This module provides an RTL-based implementation of feature map padding using the finn-rtllib fmpadding_axi component. Supports runtime reconfiguration of padding amounts and spatial feature sizes via optional AXI-Lite interface.

FMPadding_rtl Objects

class FMPadding_rtl(FMPadding, RTLBackend)

CustomOp wrapper for the finn-rtllib fmpadding_axi component.

Supports adjusting the padding amount and spatial feature sizes at runtime.

init

def __init__(onnx_node, **kwargs) -> None

Initialize the RTL FMPadding component.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications, including dynamic_mode for runtime reconfiguration

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Get Verilog top module interface names.

Returns

dict Dictionary mapping interface types to interface names, including optional AXI-Lite interface if dynamic_mode is enabled

get_template_values

def get_template_values(ifm_dims, pads, chans, simd, idt)

Calculate template parameter values for HDL generation.

Parameters

ifm_dims : list Input feature map dimensions [H, W] pads : list Padding amounts [top, left, bottom, right] chans : int Number of channels simd : int SIMD parallelism factor idt : DataType Input data type

Returns

dict Dictionary of template substitution values for HDL generation

get_dynamic_config

def get_dynamic_config(ifm_dims=None, pads=None)

Return a configuration dict to re-configure FM dimension and padding amounts during runtime.

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate HDL code from templates for this node.

Parameters

model : ModelWrapper ONNX model wrapper fpgapart : str Target FPGA part number clk : float Target clock frequency in ns

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Get list of RTL files required for this node.

Parameters

abspath : bool If True, return absolute file paths; otherwise return relative paths

Returns

list of str List of RTL file paths (4 files: fmpadding_axi.sv, fmpadding.sv, axi2we.sv, generated .v file)

code_generation_ipi

def code_generation_ipi()

Construct and returns the TCL for node instantiation in Vivado IPI.

execute_node

def execute_node(context, graph)

Execute this FMPadding node.

Performs feature map padding using C++ or RTL simulation.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

finn.custom_op.fpgadataflow.rtl.inner_shuffle_rtl

auto_size_simd

def auto_size_simd(I_dim: int, SIMD: int) -> Optional[int]

Return the smallest divisor d of I_dim such that d > SIMD. if no such divisor exists, return None.

InnerShuffle_rtl Objects

class InnerShuffle_rtl(InnerShuffle, RTLBackend)

CustomOp wrapper for the finn-rtllib inner_shuffle component.

code_generation_ipi

def code_generation_ipi()

Constructs and returns the TCL for node instantiation in Vivado IPI.

finn.custom_op.fpgadataflow.rtl.layernorm_rtl

LayerNorm_rtl Objects

class LayerNorm_rtl(LayerNorm, RTLBackend)

RTL backend implementation for LayerNorm kernel. Generates RTL code for hardware synthesis of LayerNorm operations.

finn.custom_op.fpgadataflow.rtl.matrixvectoractivation_rtl

RTL implementation of Matrix Vector Activation Unit (MVAU).

This module provides an RTL-based implementation of the Matrix Vector Activation Unit for FPGA acceleration, supporting features like double-pumped DSPs and various weight memory modes.

MVAU_rtl Objects

class MVAU_rtl(MVAU, RTLBackend)

Class that corresponds to finn-rtl Matrix Vector Unit.

init

def __init__(onnx_node, **kwargs)

Initialize the RTL Matrix Vector Activation Unit.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications, including pumpedCompute for double-pumped DSP operation

execute_node

def execute_node(context, graph)

Execute this MVAU node.

Performs matrix-vector multiplication with optional activation using C++ or RTL simulation.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

lut_estimation

def lut_estimation()

Estimate LUT resource usage.

Returns

int Estimated number of LUTs needed (currently returns 0)

dsp_estimation

def dsp_estimation(fpgapart)

Estimate DSP resource usage based on target FPGA.

Parameters

fpgapart : str Target FPGA part number

Returns

int Estimated number of DSP blocks needed

instantiate_ip

def instantiate_ip(cmd)

Instantiate the RTL IP in Vivado IPI.

Parameters

cmd : list List of TCL commands to which instantiation commands are appended

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate HDL code from templates for this node.

Parameters

model : ModelWrapper ONNX model wrapper fpgapart : str Target FPGA part number clk : float Target clock frequency in ns

prepare_codegen_default

def prepare_codegen_default(fpgapart, clk)

Prepare code generation dictionary for default implementation.

Parameters

fpgapart : str Target FPGA part number clk : float Target clock frequency in ns

Returns

tuple of (str, dict) Template file path and code generation dictionary

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Get list of RTL files required for this node.

Parameters

abspath : bool If True, return absolute file paths; otherwise return relative paths

Returns

list of str List of RTL file paths

get_verilog_paths

def get_verilog_paths()

Get list of Verilog include paths for this node.

Returns

list of str List of directory paths containing Verilog source files

finn.custom_op.fpgadataflow.rtl.requant_rtl

Requant_rtl Objects

class Requant_rtl(Requant, RTLBackend)

RTL backend for Requant operation using finn-rtllib/requant.

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate RTL code for the requant operation.

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Return list of RTL files needed for this node.

execute_node

def execute_node(context, graph)

Execute the node, using RTL simulation if exec_mode is rtlsim.

finn.custom_op.fpgadataflow.rtl.reshape_rtl

RTLBackend specialization of the Reshape operator.

Reshape_rtl Objects

@register_custom_op
class Reshape_rtl(Reshape, RTLBackend)

RTLBackend specialization of the Reshape operator

get_nodeattr_types

def get_nodeattr_types()

Custom node attributes with their types and default values.

execute_node

def execute_node(context, graph)

Execute reshape operation (RTL simulation or Python fallback).

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate HLD code by filling in the verilog template.

get_rtl_file_list

def get_rtl_file_list(abspath: bool = False)

Return list of RTL files required for this custom operation.

Arguments:

abspath - Whether to return absolute paths (default: False).

Returns:

List of paths pointing to required RTL files.

Raises:

FINNInternalError - If code_gen_dir_ipgen or gen_top_module attributes are invalid.

code_generation_ipi

def code_generation_ipi()

Code generation for IP integration.

finn.custom_op.fpgadataflow.rtl.streamingdatawidthconverter_rtl

RTL implementation of streaming data width converter.

This module provides an RTL-based implementation for converting between different stream data widths while maintaining throughput.

StreamingDataWidthConverter_rtl Objects

class StreamingDataWidthConverter_rtl(StreamingDataWidthConverter, RTLBackend)

Class that corresponds to finn-rtllib datawidth converter module.

get_nodeattr_types

def get_nodeattr_types()

Get the attribute types for this node.

check_divisible_iowidths

def check_divisible_iowidths()

Check that input and output widths are divisible.

Ensures that the stream width conversion has an integer ratio, which is required for proper operation.

Returns

bool True if widths are properly divisible, False otherwise

execute_node

def execute_node(context, graph)

Execute the node in the given context and graph for simulation.

get_template_values

def get_template_values()

Get the code generation template values for this node.

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate the HDL code for this node.

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Get list of RTL files required for this node.

Parameters

abspath : bool If True, return absolute file paths; otherwise return relative paths

Returns

list of str List of RTL file paths

code_generation_ipi

def code_generation_ipi()

Constructs and returns the TCL for node instantiation in Vivado IPI.

finn.custom_op.fpgadataflow.rtl.streamingfifo_rtl

RTL implementation of streaming FIFO.

This module provides an RTL-based implementation of streaming FIFOs for buffering data between layers, with support for both RTL and Vivado IP implementations.

StreamingFIFO_rtl Objects

class StreamingFIFO_rtl(StreamingFIFO, RTLBackend)

RTL implementation of streaming FIFO for data buffering.

init

def __init__(onnx_node, **kwargs)

Initialize the RTL streaming FIFO.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications, including impl_style for choosing between RTL and Vivado implementations

get_adjusted_depth

def get_adjusted_depth()

Get FIFO depth adjusted for implementation requirements.

For Vivado implementation, rounds up depth to nearest power-of-2.

Returns

int Adjusted FIFO depth

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Get Verilog top module interface names for this node.

Returns

dict Dictionary mapping interface types to port names, including optional maxcount output for depth monitoring

is_sim_fifo_gauge

def is_sim_fifo_gauge()

Check if this FIFO should use simulation gauge implementation.

Returns True for RTL FIFOs with depth monitoring enabled, which use an infinite Verilog queue for simulation instead of Q_srl.

Returns

bool True if using simulation gauge, False otherwise

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate HDL code from templates for this node.

Parameters

model : ModelWrapper ONNX model wrapper fpgapart : str Target FPGA part number clk : float Target clock frequency in ns

code_generation_ipi

def code_generation_ipi()

Generate TCL commands for instantiating this IP in Vivado IPI.

Returns

list of str List of TCL commands for IP instantiation

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Get list of RTL files required for this node.

Parameters

abspath : bool If True, return absolute file paths; otherwise return relative paths

Returns

list of str List of RTL file paths

prepare_rtlsim

def prepare_rtlsim(behav=False)

Prepare this node for RTL simulation.

Raises

NotImplementedError If impl_style is 'rtl' (not supported for simulation)

execute_node

def execute_node(context, graph)

Execute this FIFO node.

Performs buffering using Python simulation for cppsim mode or Vivado FIFOs, and RTL simulation for rtlsim mode with RTL-style FIFOs.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

finn.custom_op.fpgadataflow.rtl.thresholding_rtl

RTL implementation of thresholding activation.

This module provides an RTL-based implementation of thresholding activations for quantization and activation functions in FPGA dataflow architectures.

Thresholding_rtl Objects

class Thresholding_rtl(Thresholding, RTLBackend)

Class that corresponds to finn-rtllib 'thresholding' function.

init

def __init__(onnx_node, **kwargs)

Initialize the RTL thresholding activation node.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications, including memory depth triggers and optimization flags

get_pe_mem_geometries

def get_pe_mem_geometries()

Return a list of (bitwidth, depth) for PE memory configurations to be used in resource estimation.

for each bitwidth, the depth is calculated as the number of thresholds that can be stored in a single memory block the bitwidth is the bitwidth of the threshold values the depth is the number of thresholds that can be stored in a single memory block the number of memory blocks is calculated as the number of thresholds divided by the depth the number of memory blocks is then multiplied by the number of PEs to get the total number of memory blocks required for the entire layer

get_memory_estimate

def get_memory_estimate()

Return the memory estimate for this node.

bram_estimation

def bram_estimation()

Return the number of BRAMs required for this node.

uram_estimation

def uram_estimation()

Return the number of URAMs required for this node.

lut_estimation

def lut_estimation()

Return the number of LUTs required for this node.

get_all_meminit_filenames

def get_all_meminit_filenames(abspath=False)

Return a list of all .dat memory initializer files used for this node.

prepare_codegen_rtl_values

def prepare_codegen_rtl_values(model)

All dictionary values produced in this function are to replace their key value(s) in the RTL template files.

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Thresholding binary search RTL file list.

generate_hdl

def generate_hdl(model, fpgapart, clk)

Prepare HDL files from templates for synthesis.

execute_node

def execute_node(context, graph)

Execute this thresholding node.

Performs threshold comparisons using C++ or RTL simulation.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

code_generation_ipi

def code_generation_ipi()

Construct and returns the TCL commands for node instantiation as an RTL block.

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Get Verilog top module interface names for this node.

Returns

dict Dictionary mapping interface types to port names, including optional AXI-Lite interface for runtime weights

generate_params

def generate_params(model, path)

Generate threshold parameter files for RTL implementation.

Parameters

model : ModelWrapper ONNX model wrapper containing threshold values path : str Directory path where parameter files will be generated

make_weight_file

def make_weight_file(weights, weight_file_mode, weight_file_name)

Produce a file containing given weights (thresholds) in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.

Parameters

weights : numpy array Weights to be put into the file weight_file_mode : str Mode for the weight file ("decoupled_runtime" or "internal_embedded") weight_file_name : str Filename for the weight file to be generated

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize threshold datatype, with RTL-specific adjustments.

The RTL implementation saturates inputs to the threshold datatype range when the threshold datatype is narrower than the input datatype. To ensure correct comparisons at saturation boundaries, the threshold datatype must be able to represent [min_threshold - 1 : max_threshold].

finn.custom_op.fpgadataflow.rtl.vectorvectoractivation_rtl

RTL implementation of Vector-Vector Activation Unit (VVAU).

This module provides an RTL-based implementation of the Vector-Vector Activation Unit for DSP-based computation of quantized neural network activations in FPGA dataflow architectures.

VVAU_rtl Objects

class VVAU_rtl(VVAU, RTLBackend)

RTL implementation of Vector-Vector Activation Unit.

Implements DSP-based activation functions using vector-vector multiply-accumulate operations for efficient FPGA execution.

init

def __init__(onnx_node, **kwargs)

Initialize the RTL Vector-Vector Activation Unit node.

Parameters

onnx_node : NodeProto ONNX node to wrap **kwargs : dict Additional arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get dictionary of attribute names and their types for this node.

Returns

dict Dictionary mapping attribute names to type specifications, combining VVAU and RTLBackend attributes

execute_node

def execute_node(context, graph)

Execute this VVAU node.

Performs vector-vector activation using C++ or RTL simulation.

Parameters

context : dict Dictionary mapping tensor names to numpy arrays graph : GraphProto ONNX graph containing this node

lut_estimation

def lut_estimation()

Estimate LUT utilization for this VVAU node.

Returns

int LUT count estimate (always 0 for VVAU as it uses DSPs)

dsp_estimation

def dsp_estimation(fpgapart)

Estimate DSP utilization for this VVAU node.

Parameters

fpgapart : str Target FPGA part name

Returns

int Number of DSP blocks required (PE * ceil(SIMD/3))

instantiate_ip

def instantiate_ip(cmd)

Add RTL IP instantiation commands to Vivado script.

Parameters

cmd : list List of Vivado TCL commands to append to

generate_hdl

def generate_hdl(model, fpgapart, clk)

Generate HDL code for this VVAU node.

Parameters

model : ModelWrapper ONNX model wrapper containing weights fpgapart : str Target FPGA part name clk : float Target clock period in nanoseconds

prepare_codegen_default

def prepare_codegen_default(fpgapart, clk)

Prepare default code generation dictionary for HDL templates.

Parameters

fpgapart : str Target FPGA part name clk : float Target clock period in nanoseconds

Returns

tuple (template_path, code_gen_dict) where template_path is the path to the Verilog wrapper template and code_gen_dict contains substitutions

get_rtl_file_list

def get_rtl_file_list(abspath=False)

Get list of RTL files needed for this VVAU node.

Parameters

abspath : bool, optional Whether to return absolute paths (default: False)

Returns

list List of RTL file paths required for synthesis

get_verilog_paths

def get_verilog_paths() -> list[str]

Get list of Verilog paths required for this node.

finn.custom_op.fpgadataflow.rtlbackend

RTLBackend Objects

class RTLBackend(ABC)

RTLBackend class all custom ops that correspond to a module in finn-rtllib are using functionality of. Contains different functions every RTL custom node should have. Some as abstract methods, these have to be filled when writing a new RTL custom op node.

prepare_rtlsim

def prepare_rtlsim(behav=False)

Creates a xsi emulation library for the RTL code generated for this node, sets the rtlsim_so attribute to its path.

get_verilog_paths

def get_verilog_paths()

Returns path to code gen directory. Can be overwritten to return additional paths to relevant verilog files

get_rtl_file_list

@abstractmethod
def get_rtl_file_list(abspath=False)

Returns list of rtl files. Needs to be filled by each node.

finn.custom_op.fpgadataflow.shuffle

Shuffle Objects

class Shuffle(HWCustomOp)

Abstraction layer for Shuffle (rearrange and transpose) layers. This operator is later transformed into InnerShuffle and OuterShuffle operations.

get_nodeattr_types

def get_nodeattr_types()

The attributes for the Shuffle node capture the optional reshapes either side of the transpose. Below is a diagram indicating what tensors the attribute names are referring to.

  │ in_shape
  │
  │

┌─────▼──────┐ │ │ │ Reshape │ │ │ └─────┬──────┘ │ │ transpose_in_shape ┌─────▼──────┐ │ │ │ Transpose │ │ │ └─────┬──────┘ │ transpose_out_shape ┌─────▼──────┐ │ │ │ Reshape │ │ │ └─────┬──────┘ │ │ out_shape ▼

get_exp_cycles

def get_exp_cycles()

Estimate cycles by decomposing into Inner/OuterShuffle stages.

Decomposes the transpose into a sequence of hardware-constrained operations (inner_shuffle / outer_shuffle), creates temporary nodes for each stage, and returns the MAX of their cycle estimates (stages are pipelined, so throughput is limited by the slowest).

finn.custom_op.fpgadataflow.split

StreamingSplit Objects

class StreamingSplit(HWCustomOp)

Abstraction layer for HW implementation of Split. Only supports splitting along the last (channel) axis.

finn.custom_op.fpgadataflow.squeeze

FPGA dataflow custom operator for Squeeze operation.

Squeeze Objects

@register_custom_op
class Squeeze(HWCustomOp)

Hardware custom operator for Squeeze operation.

Removes single-dimension entries from the shape of a tensor.

init

def __init__(onnx_node, **kwargs) -> None

Initialize the Squeeze operator from an ONNX node.

get_nodeattr_types

def get_nodeattr_types()

Return the dictionary of node attributes for the Squeeze operator.

inp_dtype

@property
def inp_dtype() -> BaseDataType

Return the input datatype.

out_dtype

@property
def out_dtype() -> BaseDataType

Return the output datatype.

inp_shape

@property
def inp_shape()

Return the input shape.

out_shape

@property
def out_shape()

Return the output shape.

pe

@property
def pe()

Return the number of parallel processing elements (PE).

make_shape_compatible_op

def make_shape_compatible_op(model: ModelWrapper) -> NodeProto

Create a shape-compatible operation for ONNX shape inference.

Returns a standard ONNX Squeeze node for shape inference purposes.

infer_node_datatype

def infer_node_datatype(model: ModelWrapper) -> None

Infer and set the datatype of the node output.

execute_node

def execute_node(context, graph) -> None

Execute unsqueeze operation (Python fallback).

verify_node

def verify_node()

Verify the node attributes, inputs and outputs.

get_input_datatype

def get_input_datatype(ind=0) -> BaseDataType

Return the datatype of the input at the given index.

get_output_datatype

def get_output_datatype(ind=0) -> BaseDataType

Return the datatype of the output at the given index.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Return the unfolded input shape at the given index.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Return the unfolded output shape at the given index.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Return the folded input shape at the given index.

Applies PE-based folding to the last dimension.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Return the folded output shape at the given index.

Applies PE-based folding to the last dimension.

get_instream_width

def get_instream_width(ind=0)

Return the width of the input stream in bits at the given index.

get_outstream_width

def get_outstream_width(ind=0)

Return the width of the output stream in bits at the given index.

get_number_output_values

def get_number_output_values()

Return the number of expected output values from the operator.

get_exp_cycles

def get_exp_cycles()

Return the expected number of cycles for the squeeze operation.

finn.custom_op.fpgadataflow.streamingdataflowpartition

StreamingDataflowPartition Objects

class StreamingDataflowPartition(CustomOp)

Class that corresponds to the meta/container node StreamingDataflowPartition which is a placeholder for a group of fpgadataflow nodes that have been separated out into a FINN-ONNX model of its own. Note that is does not produce any HLS or bitfile by itself.

finn.custom_op.fpgadataflow.streamingdatawidthconverter

StreamingDataWidthConverter Objects

class StreamingDataWidthConverter(HWCustomOp)

Abstraction layer for HW implementation of StreamingDataWidthConverter

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

lut_estimation

def lut_estimation()

Calculates resource estimations for LUTs

finn.custom_op.fpgadataflow.streamingfifo

StreamingFIFO Objects

class StreamingFIFO(HWCustomOp)

bram_estimation

def bram_estimation()

Calculates resource estimation for BRAM

uram_estimation

def uram_estimation()

Calculates resource estimation for URAM

lut_estimation

def lut_estimation()

Calculates resource estimations for LUTs

finn.custom_op.fpgadataflow.thresholding

Module that provides the Thresholding class,that implements multi-threshold activation functions. The thresholding operation compares input values against a set of thresholds to produce quantized outputs.

Thresholding Objects

class Thresholding(HWCustomOp)

Abstraction layer for HW implementation of Thresholding.

init

def __init__(onnx_node, **kwargs)

Initialize the Thresholding node.

get_nodeattr_types

def get_nodeattr_types()

Return a dictionary of attribute names and their types for this node.

Returns a dictionary describing node attributes including parallelization (PE), number of channels, data types, and runtime configuration options.

infer_node_datatype

def infer_node_datatype(model)

Infer and set the data types for node inputs and outputs.

Updates the inputDataType attribute based on the model's tensor datatype and sets the output tensor datatype based on the outputDataType attribute.

Arguments:

model - The ONNX model containing this node.

verify_node

def verify_node()

Verify that the node is configured correctly.

Checks that the backend attribute is set to 'fpgadataflow' and that all necessary attributes exist.

Returns:

List of informational messages about the node's configuration status.

get_input_datatype

def get_input_datatype(ind=0)

Return FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Return FINN DataType of output.

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize threshold datatype bitwidth based on actual threshold values. This function should not round or clip the threshold values, that is done in RoundAndClipThresholds.

get_instream_width

def get_instream_width(ind=0)

Return the width of the input stream in bits.

Arguments:

ind - Input index (0 for data input, 1 for threshold/weight input).

Returns:

Width of the input stream in bits.

get_outstream_width

def get_outstream_width(ind=0)

Return the width of the output stream in bits.

Arguments:

ind - Output index (currently only supports index 0).

Returns:

Width of the output stream in bits.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Return the folded input shape for hardware implementation.

The folded shape accounts for parallelization (PE) and temporal memory (TMEM) organization used in the hardware accelerator.

Arguments:

ind - Input index (currently only supports index 0).

Returns:

Tuple representing the folded input shape.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Return the folded output shape for hardware implementation.

Arguments:

ind - Output index (currently only supports index 0).

Returns:

Tuple representing the folded output shape (same as folded input shape).

get_normal_input_shape

def get_normal_input_shape(ind=0)

Return the normal (unfolded) input shape.

Arguments:

ind - Input index (currently only supports index 0).

Returns:

Tuple representing the normal input shape.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Return the normal (unfolded) output shape.

Arguments:

ind - Output index (currently only supports index 0).

Returns:

Tuple representing the normal output shape (same as normal input shape).

get_exp_cycles

def get_exp_cycles()

Return the expected number of execution cycles.

Calculates cycles as: Channels/PE * batch size * feature map dimensions.

Returns:

Expected number of cycles for execution.

get_hw_compatible_threshold_tensor

def get_hw_compatible_threshold_tensor(orig_thres_matrix)

Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:

ensure MH % PE == 0
for unsigned inputs, ensure thresholds are positive
interleave rows between PEs
reshape into (PE, TMEM, n_thres_steps) and return.

execute_node

def execute_node(context, graph)

Execute the thresholding operation.

Performs multi-threshold comparison on input values using the threshold tensor. Handles data layout transformations and applies output bias (ActVal) if configured. Converts output to bipolar format if the output data type is BIPOLAR.

Arguments:

context - Dictionary containing input values keyed by tensor names.
graph - The ONNX graph containing this node.

calc_tmem

def calc_tmem()

Calculate and returns TMEM.

finn.custom_op.fpgadataflow.unsqueeze

FPGA dataflow custom operator for Unsqueeze operation.

Unsqueeze Objects

@register_custom_op
class Unsqueeze(HWCustomOp)

Hardware custom operator for Unsqueeze operation.

Inserts single-dimension entries into the shape of a tensor.

init

def __init__(onnx_node, **kwargs) -> None

Initialize the Unsqueeze operator from an ONNX node.

get_nodeattr_types

def get_nodeattr_types()

Return the dictionary of node attributes for the Unsqueeze operator.

inp_dtype

@property
def inp_dtype() -> BaseDataType

Return the input datatype.

out_dtype

@property
def out_dtype() -> BaseDataType

Return the output datatype.

inp_shape

@property
def inp_shape()

Return the input shape.

out_shape

@property
def out_shape()

Return the output shape.

pe

@property
def pe()

Return the number of parallel processing elements (PE).

make_shape_compatible_op

def make_shape_compatible_op(model: ModelWrapper) -> NodeProto

Create a shape-compatible operation for ONNX shape inference.

Returns a standard ONNX Unsqueeze node for shape inference purposes.

infer_node_datatype

def infer_node_datatype(model: ModelWrapper) -> None

Infer and set the datatype of the node output.

execute_node

def execute_node(context, graph) -> None

Execute unsqueeze operation (Python fallback).

verify_node

def verify_node()

Verify the node attributes, inputs and outputs.

get_input_datatype

def get_input_datatype(ind=0) -> BaseDataType

Return the datatype of the input at the given index.

get_output_datatype

def get_output_datatype(ind=0) -> BaseDataType

Return the datatype of the output at the given index.

get_normal_input_shape

def get_normal_input_shape(ind=0)

Return the unfolded input shape at the given index.

get_normal_output_shape

def get_normal_output_shape(ind=0)

Return the unfolded output shape at the given index.

get_folded_input_shape

def get_folded_input_shape(ind=0)

Return the folded input shape at the given index.

Applies PE-based folding to the last dimension.

get_folded_output_shape

def get_folded_output_shape(ind=0)

Return the folded output shape at the given index.

Applies PE-based folding to the last dimension.

get_instream_width

def get_instream_width(ind=0)

Return the width of the input stream in bits at the given index.

get_outstream_width

def get_outstream_width(ind=0)

Return the width of the output stream in bits at the given index.

get_number_output_values

def get_number_output_values()

Return the number of expected output values from the operator.

get_exp_cycles

def get_exp_cycles()

Return the expected number of cycles for the unsqueeze operation.

finn.custom_op.fpgadataflow.upsampler

UpsampleNearestNeighbour Objects

class UpsampleNearestNeighbour(HWCustomOp)

Abstraction layer for HW implementation of UpsampleNearestNeighbour.

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output. (Same as input datatype)

finn.custom_op.fpgadataflow.vectorvectoractivation

Vector-Vector Activation Unit (VVAU) implementation for FPGA dataflow.

This module contains the VVAU class which provides hardware abstraction for vector-vector activation layers in FPGA implementations. The VVAU performs convolutional operations with thresholding activation functions.

VVAU Objects

class VVAU(HWCustomOp)

Abstraction layer for HW implementation of VectorVectorActivation layers.

init

def __init__(onnx_node, **kwargs)

Initialize the VVAU (Vector-Vector Activation Unit) instance.

Arguments:

onnx_node - ONNX node representing the VVAU operation
**kwargs - Additional keyword arguments passed to parent class

get_nodeattr_types

def get_nodeattr_types()

Get the dictionary of node attribute types for VVAU.

Returns:

dict - Dictionary mapping attribute names to their types and constraints

execute_node

def execute_node(context, graph)

Execute the VVAU node operation.

Performs the vector-vector activation computation including matrix multiplication and optional thresholding activation.

Arguments:

context - Execution context containing input tensors
graph - ONNX graph containing the node

infer_node_datatype

def infer_node_datatype(model)

Infer and set the node's data types based on the model.

Arguments:

model - FINN model containing the node

get_input_datatype

def get_input_datatype(ind=0)

Returns FINN DataType of input.

get_accumulator_datatype

def get_accumulator_datatype()

Returns FINN DataType of accumulator

get_output_datatype

def get_output_datatype(ind=0)

Returns FINN DataType of output.

get_instream_width

def get_instream_width(ind=0)

Get the input stream width for the specified input.

Arguments:

ind - Input index (0 for activations, 1 for weights, 2 for thresholds)

Returns:

int - Input stream width in bits

Raises:

Exception - If input index is out of range

get_outstream_width

def get_outstream_width(ind=0)

Get the output stream width.

Arguments:

ind - Output index (default 0)

Returns:

int - Output stream width in bits

get_folded_input_shape

def get_folded_input_shape(ind=0)

Get the folded input shape for hardware implementation.

Arguments:

ind - Input index (0 for activations, 1 for weights)

Returns:

tuple - Folded input shape dimensions

Raises:

Exception - If input index is undefined or requirements not met

get_folded_output_shape

def get_folded_output_shape(ind=0)

Get the folded output shape for hardware implementation.

Arguments:

ind - Output index (default 0)

Returns:

tuple - Folded output shape dimensions

get_normal_input_shape

def get_normal_input_shape(ind=0)

Get the normal (unfolded) input shape.

Arguments:

ind - Input index (default 0)

Returns:

tuple - Normal input shape dimensions

get_normal_output_shape

def get_normal_output_shape(ind=0)

Get the normal (unfolded) output shape.

Arguments:

ind - Output index (default 0)

Returns:

tuple - Normal output shape dimensions

calc_wmem

def calc_wmem()

Calculates and returns WMEM.

calc_tmem

def calc_tmem()

Calculates and returns TMEM.

uram_estimation

def uram_estimation()

Estimate UltraRAM (URAM) usage for this layer.

Returns:

int - Number of URAMs required

bram_estimation

def bram_estimation()

Calculates resource estimation for BRAM

bram_efficiency_estimation

def bram_efficiency_estimation()

Estimate BRAM efficiency (utilization) for this layer.

Returns:

float - BRAM efficiency ratio (actual usage / allocated capacity)

uram_efficiency_estimation

def uram_efficiency_estimation()

Function for URAM efficiency estimation: actual parameter storage needed divided by the allocated URAM storage (from estimation)

get_exp_cycles

def get_exp_cycles()

Get the expected number of execution cycles for this layer.

Returns:

int - Expected number of clock cycles for execution

minimize_accumulator_width

def minimize_accumulator_width(model)

Minimize the accumulator bit width according to the weight values, input data types, and size of dot product

minimize_weight_bit_width

def minimize_weight_bit_width(model)

Minimize the bit width based on the values of the weights.

get_hw_compatible_threshold_tensor

def get_hw_compatible_threshold_tensor(orig_thres_matrix)

Convert the original numpy weight matrix orig_weight_matrix into a form suitable for passing to the hlslib call:

ensure MH % PE == 0
for bipolar weights&inputs, ensure thresholds are positive
interleave rows between PEs
reshape into (PE, TMEM, n_thres_steps) and return

get_hw_compatible_weight_tensor

def get_hw_compatible_weight_tensor(orig_weight_matrix)

Convert weight matrix to hardware-compatible format.

Arguments:

orig_weight_matrix - Original weight matrix

Returns:

numpy.ndarray - Hardware-compatible weight tensor

make_weight_file

def make_weight_file(weights, weight_file_mode, weight_file_name)

Produce a file containing given weights in appropriate format for this layer. This file can be used for either synthesis or run-time reconfig of weights.

Arguments:

weights : numpy array with weights to be put into the file
weight_file_mode : one of {hls_header, decoupled_verilog_dat, decoupled_runtime}
weight_file_name : filename for the weight file to be generated

generate_params

def generate_params(model, path)

Generate parameter files for hardware implementation.

Arguments:

model - FINN model containing the node
path - Path to the code generation directory

get_op_and_param_counts

def get_op_and_param_counts()

Get operation and parameter counts for this layer.

Returns:

dict - Dictionary containing operation and parameter counts by type

derive_characteristic_fxns

def derive_characteristic_fxns(period)

Derive characteristic functions for RTL simulation.

Arguments:

period - Clock period for simulation

get_verilog_top_module_intf_names

def get_verilog_top_module_intf_names()

Get Verilog top module interface names.

Returns:

dict - Dictionary mapping interface types to their names

code_generation_ipi

def code_generation_ipi()

Generate IP integrator (IPI) commands for hardware synthesis.

Returns:

list - List of TCL commands for IP integrator

Raises:

Exception - If unrecognized mem_mode is specified

📚 Navigation: ← Back to API Documentation

This page was generated automatically from source code documentation.

finn.custom_op

Table of Contents

finn.custom_op.fpgadataflow.attention_heads

SplitMultiHeads Objects

__init__

get_nodeattr_types

heads

packed

dtype

num_elems

num_inputs

make_shape_compatible_op

infer_node_datatype

execute_node

verify_node

get_input_datatype

get_output_datatype

get_normal_input_shape

get_normal_output_shape

get_folded_input_shape

get_folded_output_shape

get_instream_width

get_outstream_width

get_number_output_values

get_exp_cycles

MergeMultiHeads Objects

__init__

get_nodeattr_types

heads

packed

dtype

num_elems

num_inputs

squeezed

make_shape_compatible_op

infer_node_datatype

execute_node

verify_node

get_input_datatype

get_output_datatype

get_normal_input_shape

get_normal_output_shape

get_folded_input_shape

get_folded_output_shape

get_instream_width

get_outstream_width

get_number_output_values

get_exp_cycles

finn.custom_op.fpgadataflow.concat

StreamingConcat Objects

finn.custom_op.fpgadataflow.convolutioninputgenerator

ConvolutionInputGenerator Objects

get_input_datatype

get_output_datatype

get_instream_width

finn.custom_op.fpgadataflow.crop

Crop Objects

finn.custom_op.fpgadataflow.duplicatestreams

DuplicateStreams Objects

get_input_datatype

get_output_datatype

get_instream_width

get_outstream_width

finn.custom_op.fpgadataflow.elementwise_binary

ElementwiseBinaryOperation Objects

calc_wmem

finn.custom_op.fpgadataflow.fmpadding

FMPadding Objects

get_padded_odim

get_input_datatype

get_output_datatype

finn.custom_op.fpgadataflow.fmpadding_pixel

FMPadding_Pixel Objects

get_padded_odim

get_input_datatype

get_output_datatype

finn.custom_op.fpgadataflow.globalaccpool

GlobalAccPool Objects

get_input_datatype

get_output_datatype

init

init