Skip to content

Add PWPolyF piecewise polynomial activation layer#1573

Open
ollycassidy13 wants to merge 13 commits into
Xilinx:devfrom
ollycassidy13:feature/pwpolyf
Open

Add PWPolyF piecewise polynomial activation layer#1573
ollycassidy13 wants to merge 13 commits into
Xilinx:devfrom
ollycassidy13:feature/pwpolyf

Conversation

@ollycassidy13
Copy link
Copy Markdown

  • Add PWPolyF, an RTL activation layer that approximates GELU, SiLU, Sigmoid,
    and Tanh using piecewise degree-2 polynomials evaluated via Horner's method
    on cascaded DSPFP32 FMA units. Two DSPs per PE, no BRAM, single-cycle
    throughput (II=1).
  • Coefficients and per-function clamping config are delivered through a
    SystemVerilog package (pwpolyf_pkg.sv) containing a func_cfg_t struct,
    regenerated at build time for the node's K value by generate_coeffs_pkg().
  • InferPWPolyFLayer converts both the explicit PWPolyF custom op and
    standard ONNX activations (Gelu, Sigmoid, Tanh, Sigmoid+Mul for
    SiLU, Div+Erf+Add+Mul+Mul for GELU with dynamo=True / opset < 20) to
    PWPolyF HW layers with default K=3.
  • PE folding handled by SetFolding. Resource estimates: 2 DSPs + ~200 LUTs
    per PE, 0 BRAM.

@STFleming STFleming requested review from STFleming and auphelia April 24, 2026 13:41
Copy link
Copy Markdown
Collaborator

@STFleming STFleming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @ollycassidy13 this is a very useful PR!

My biggest current concern is testing (see comments) it would be great if we could work on improving the test coverage together to verify thing more robustly. In particular making sure that codegen and the RTL is verified. Please ping me if you have any questions.

Thanks again for the great work!

Comment thread tests/fpgadataflow/test_fpgadataflow_pwpolyf.py
Comment thread src/finn/custom_op/fpgadataflow/pwpolyf.py Outdated
Comment thread src/finn/custom_op/fpgadataflow/pwpolyf.py Outdated
@STFleming STFleming self-requested a review April 30, 2026 14:57
Comment thread docs/finn/pwpolyf.md Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this into the components subsection? https://github.com/Xilinx/finn/tree/dev/docs/finn/components

@@ -0,0 +1,196 @@
# Copyright (C) 2026, Advanced Micro Devices, Inc.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.

Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause

from finn.util.pwpolyf import CLAMP_CFG, NUM_OCTAVES, SUPPORTED_FUNCS, _fit_coefficients


def _float_to_hex(f):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FINN already has float-to-hex packing in finn.util.data_packing.array2hexstring (line 92 handles floats). Please use:

array2hexstring(np.array([f]), DataType["FLOAT32"], 32, prefix="").upper()

This keeps it consistent with other RTL backends. Alternatively, BitArray(float=f, length=32).hex.upper() would also work.

return "%08X" % struct.unpack("!I", struct.pack("!f", float(f)))[0]


def generate_coeffs_pkg(K, degree=2, num_samples=1000):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_coeffs_pkg could be a class method of PWPolyF_rtl. Other RTL backends use instance methods like prepare_codegen_rtl_values or generate_params that access node attributes via self. This would be more consistent and avoid passing K and degree explicitly:

  def _generate_coeffs_pkg(self, num_samples=1000):
      K = self.get_nodeattr("K")
      degree = self.get_nodeattr("degree")
      ...

@@ -0,0 +1,207 @@
# Copyright (C) 2026, Advanced Micro Devices, Inc.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.

Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause

oshape = self.get_normal_output_shape()
return super().make_const_shape_op(oshape)

def infer_node_datatype(self, model):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PWPolyF only supports FLOAT32 (as checked in verify_node), we could warn or raise an error if the incoming datatype is not FLOAT32, rather than silently updating the attribute to an unsupported type. Additionally, outputDataType is read but never updated. For a FLOAT32→FLOAT32 op it should be set to match the input.

def infer_node_datatype(self, model):
   node = self.onnx_node
   idt = model.get_tensor_datatype(node.input[0])
   assert idt == DataType["FLOAT32"], (
       f"{node.name}: PWPolyF requires FLOAT32 input, got {idt}"
   )
   self.set_nodeattr("inputDataType", idt.name)
   self.set_nodeattr("outputDataType", idt.name)
   model.set_tensor_datatype(node.output[0], idt)

Alternatively, since input and output are always FLOAT32, you could simplify to a single dataType node attribute instead of separate inputDataType/outputDataType.

def get_normal_output_shape(self, ind=0):
return self.get_normal_input_shape()

def get_number_output_values(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_number_output_values: This is identical to the base HWCustomOp implementation. Can be removed.

Comment thread src/finn/util/pwpolyf.py Outdated
@@ -0,0 +1,237 @@
# Copyright (C) 2026, Advanced Micro Devices, Inc.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.

Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause

@@ -0,0 +1,753 @@
# Copyright (C) 2026, Advanced Micro Devices, Inc.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.

Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause

from finn.transformation.fpgadataflow.specialize_layers import SpecializeLayers
from finn.util.pwpolyf import PiecewisePolyActivation

test_fpga_part = "xcve2002-sbva484-2MP-e-S"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FPGA part used in the tests is not supported in Vivado 2022.2. Please use a part that works with both Vivado 2022.2 and 2024.2, such as the VCK190 part (xcvc1902-vsva2197-2MP-e-S).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants