Add PWPolyF piecewise polynomial activation layer#1573
Conversation
STFleming
left a comment
There was a problem hiding this comment.
Nice work @ollycassidy13 this is a very useful PR!
My biggest current concern is testing (see comments) it would be great if we could work on improving the test coverage together to verify thing more robustly. In particular making sure that codegen and the RTL is verified. Please ping me if you have any questions.
Thanks again for the great work!
There was a problem hiding this comment.
Could you please move this into the components subsection? https://github.com/Xilinx/finn/tree/dev/docs/finn/components
| @@ -0,0 +1,196 @@ | |||
| # Copyright (C) 2026, Advanced Micro Devices, Inc. | |||
There was a problem hiding this comment.
You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.
Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause
| from finn.util.pwpolyf import CLAMP_CFG, NUM_OCTAVES, SUPPORTED_FUNCS, _fit_coefficients | ||
|
|
||
|
|
||
| def _float_to_hex(f): |
There was a problem hiding this comment.
FINN already has float-to-hex packing in finn.util.data_packing.array2hexstring (line 92 handles floats). Please use:
array2hexstring(np.array([f]), DataType["FLOAT32"], 32, prefix="").upper()
This keeps it consistent with other RTL backends. Alternatively, BitArray(float=f, length=32).hex.upper() would also work.
| return "%08X" % struct.unpack("!I", struct.pack("!f", float(f)))[0] | ||
|
|
||
|
|
||
| def generate_coeffs_pkg(K, degree=2, num_samples=1000): |
There was a problem hiding this comment.
generate_coeffs_pkg could be a class method of PWPolyF_rtl. Other RTL backends use instance methods like prepare_codegen_rtl_values or generate_params that access node attributes via self. This would be more consistent and avoid passing K and degree explicitly:
def _generate_coeffs_pkg(self, num_samples=1000):
K = self.get_nodeattr("K")
degree = self.get_nodeattr("degree")
...
| @@ -0,0 +1,207 @@ | |||
| # Copyright (C) 2026, Advanced Micro Devices, Inc. | |||
There was a problem hiding this comment.
You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.
Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause
| oshape = self.get_normal_output_shape() | ||
| return super().make_const_shape_op(oshape) | ||
|
|
||
| def infer_node_datatype(self, model): |
There was a problem hiding this comment.
Since PWPolyF only supports FLOAT32 (as checked in verify_node), we could warn or raise an error if the incoming datatype is not FLOAT32, rather than silently updating the attribute to an unsupported type. Additionally, outputDataType is read but never updated. For a FLOAT32→FLOAT32 op it should be set to match the input.
def infer_node_datatype(self, model):
node = self.onnx_node
idt = model.get_tensor_datatype(node.input[0])
assert idt == DataType["FLOAT32"], (
f"{node.name}: PWPolyF requires FLOAT32 input, got {idt}"
)
self.set_nodeattr("inputDataType", idt.name)
self.set_nodeattr("outputDataType", idt.name)
model.set_tensor_datatype(node.output[0], idt)
Alternatively, since input and output are always FLOAT32, you could simplify to a single dataType node attribute instead of separate inputDataType/outputDataType.
| def get_normal_output_shape(self, ind=0): | ||
| return self.get_normal_input_shape() | ||
|
|
||
| def get_number_output_values(self): |
There was a problem hiding this comment.
get_number_output_values: This is identical to the base HWCustomOp implementation. Can be removed.
| @@ -0,0 +1,237 @@ | |||
| # Copyright (C) 2026, Advanced Micro Devices, Inc. | |||
There was a problem hiding this comment.
You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.
Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause
| @@ -0,0 +1,753 @@ | |||
| # Copyright (C) 2026, Advanced Micro Devices, Inc. | |||
There was a problem hiding this comment.
You can use the SPDX-identifier for the copyright header instead and the year is not necessary anymore.
Copyright Advanced Micro Devices, Inc.
SPDX-License-Identifier: BSD-3-Clause
| from finn.transformation.fpgadataflow.specialize_layers import SpecializeLayers | ||
| from finn.util.pwpolyf import PiecewisePolyActivation | ||
|
|
||
| test_fpga_part = "xcve2002-sbva484-2MP-e-S" |
There was a problem hiding this comment.
The FPGA part used in the tests is not supported in Vivado 2022.2. Please use a part that works with both Vivado 2022.2 and 2024.2, such as the VCK190 part (xcvc1902-vsva2197-2MP-e-S).
and Tanh using piecewise degree-2 polynomials evaluated via Horner's method
on cascaded DSPFP32 FMA units. Two DSPs per PE, no BRAM, single-cycle
throughput (II=1).
SystemVerilog package (
pwpolyf_pkg.sv) containing afunc_cfg_tstruct,regenerated at build time for the node's K value by
generate_coeffs_pkg().InferPWPolyFLayerconverts both the explicitPWPolyFcustom op andstandard ONNX activations (
Gelu,Sigmoid,Tanh,Sigmoid+MulforSiLU,
Div+Erf+Add+Mul+Mulfor GELU with dynamo=True / opset < 20) toPWPolyF HW layers with default K=3.
SetFolding. Resource estimates: 2 DSPs + ~200 LUTsper PE, 0 BRAM.