Skip to content

Latest commit

 

History

History
executable file
·
358 lines (276 loc) · 18.5 KB

README.md

File metadata and controls

executable file
·
358 lines (276 loc) · 18.5 KB

AC Fixed Sample

This FPGA sample is structured as a tutorial that demonstrates how to use the Algorithmic C (AC) fixed-point data type ac_fixed and some best practices.

Area Description
What you will learn How different methods of ac_fixed number construction affect hardware resource utilization.
How to access and use the ac_fixed math library functions.
Use recommended method for constructing ac_fixed numbers in your kernel.
Understand trade offs between accuracy of results for reduced resource usage on the FPGA.
Time to complete 30 minutes
Category Concepts and Functionality

Purpose

This FPGA tutorial shows you how to use the ac_fixed type to perform fixed-point arithmetic and includes some simple examples.

You can use the fixed-point data type in place of native floating point types to generate area efficient and optimized designs for the FPGA. Operations that do not utilize the full dynamic range of the native types are good candidates for using ac_fixed types. For example, multiplying by a number in the range of (-1.0, 1.0).

This tutorial shows the recommended method for constructing an ac_fixed number along with some examples of using the fixed point math library functions. Additionally, the tutorial shows the methods and examples can be used to reduce the area of the hardware generated by the compiler by trading off accuracy of the mathematical operations.

Prerequisites

Optimized for Description
OS Ubuntu* 20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10, 11
Windows Server* 2019
Hardware Intel® Agilex® 7, Agilex® 5, Arria® 10, Stratix® 10, and Cyclone® V FPGAs
Software Intel® oneAPI DPC++/C++ Compiler

Note: Even though the Intel DPC++/C++ oneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.

For using the simulator flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) and one of the following simulators must be installed and accessible through your PATH:

  • Questa*-Intel® FPGA Edition
  • Questa*-Intel® FPGA Starter Edition
  • ModelSim® SE

When using the hardware compile flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) must be installed and accessible through your PATH.

Warning Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.

This sample is part of the FPGA code samples. It is categorized as a Tier 2 sample that demonstrates a compiler feature.

flowchart LR
   tier1("Tier 1: Get Started")
   tier2("Tier 2: Explore the Fundamentals")
   tier3("Tier 3: Explore the Advanced Techniques")
   tier4("Tier 4: Explore the Reference Designs")

   tier1 --> tier2 --> tier3 --> tier4

   style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier2 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
   style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
Loading

Find more information about how to navigate this part of the code samples in the FPGA top-level README.md. You can also find more information about troubleshooting build errors, links to selected documentation, and more.

Key Implementation Details

The sample illustrates the important concepts.

  • Constructing an ac_fixed from a float or double value is much more area intensive than constructing one from another ac_fixed.
  • The ac_fixed math library provides a set of functions for various math operations.
  • The functions can be used to trade off accuracy of results for reduced resource usage on the FPGA.
  • When using these functions, be mindful of the widths of the input and return types and follow the parameterization laid out in the header file for optimal results.

Simple Code Example

An ac_fixed number can be defined as follows:

ac_fixed<W, I, S> a;

Here W specifies the width in bits and S is a bool indicating if the number is signed. Signed numbers use the most significant bit (MSB) of the W bits to store the sign bit. The second parameter I is an integer that specifies the location of the fixed point relative to the MSB. Here are some examples of the range and quantum of ac_fixed numbers.

ac_fixed<4,4,true> x; // xxxx. : range = [-8, 7], quantum = 1
ac_fixed<4,0,true> x; // .xxxx : range = [-8/16, 7/16], quantum = 1/16
ac_fixed<4,0,false> x; // .xxxx : range = [0, 15/16], quantum = 1/16
ac_fixed<4,7,false> x; // xxxx000. : range = [0, 120], quantum = 8
ac_fixed<4,-3,true> x; // (.xxxx)2^(-3) : range=[-1/16, 7/128], quantum = 1/128

When creating an ac_fixed value, the created value may be less precise than the source value or the conversion may trigger an overflow. ac_fixed provides enums to select quantization and overflow modes, which determine how the source value will be converted.

  • The default quantization mode is AC_TRN, which drops bits to the right of LSB when quantization occurs.
  • The default overflow mode is AC_WRAP, which drops bits to the left of the MSB when overflow occurs.

Note: To select specific quantization and overflow modes, refer to section 2.1 Quantization and Overflow in Algorithmic C (AC) Datatypes, Software Version v3.7. June 2016 for more details.

To use an ac_fixed type in your code, include the following header:

#include <sycl/ext/intel/ac_types/ac_fixed.hpp>

Important: You must pass the -qactypes option to the icpx command on Linux or the /Qactypes option to the icx-cl command on Windows when building your SYCL program in order to include ac_types header files on the include path and link against the AC type libraries. In this tutorial, the options are passed through the src/CMakeLists.txt file.

Recommended Method for Constructing ac_fixed Numbers

The compiler uses significant FPGA resources to convert floating point values to ac_fixed values. The kernel ConstructFromFloat in the function TestConstructFromFloat constructs an ac_fixed object from an accessor to a native float type.

In contrast, the kernel ConstructFromACFixed in the function TestConstructFromACFixed constructs an ac_fixed object from an accessor to another ac_fixed object. This consumes far less area than the previous kernel. See the section on Examining the Reports below to understand where to look for this difference within the optimization reports.

Using the ac_fixed Math Functions

To use the ac_fixed math functions in your code, include the following header:

#include <sycl/ext/intel/ac_types/ac_fixed_math.hpp>

The functions TestCalculateWithFloat and TestCalculateWithACFixed in this tutorial design contain the kernels CalculateWithFloat and CalculateWithACFixed respectively. Both calculate the simple expression for some input x.

square_root ( sine(x) * sine(x) + cosine(x) * cosine(x) )

The kernel CalculateWithFloat uses floating point values and the standard math library while CalculateWithACFixed uses ac_fixed values and the ac_fixed math library.

In the kernel CalculateWithACFixed, the sin_fixed and cos_fixed functions require the integer part's bit width to be 3 and the input value range to be within [-pi, pi], while the input width of the functions can be chosen depending on the accuracy requirement. For this tutorial, ac_fixed inputs are instantiated with the following parameters:

W = 10, I = 3, S = true

In this tutorial, the ac_fixed numbers are smaller in size than floating point numbers, which results in a reduction of the FPGA resources at the expense of accuracy. To see the trade offs in accuracy, compare the numeric results of the operations. The area utilization differences will be discussed in the section on Examining the Reports.

When you use the ac_fixed library, keep the following points in mind:

  • Input Bit Width and Input Value Range Limits

    The fixed-point math functions have bit width and input value range requirements. All bit width and input value range requirements are documented at the top of the ac_fixed_math.hpp file, which locates in ${ONEAPI_ROOT}/compiler/latest/linux/lib/oclfpga/include/sycl/ext/intel/ac_types on Linux or %ONEAPI_ROOT%\compiler\latest\windows\lib\oclfpga\include\sycl\ext\intel\ac_types on Windows.

  • Return Types

    For fixed-point functions, each function has a default return type. Assigning the result to a non-default return type triggers a type conversion and can cause an increase in logic use or a loss of accuracy in your results. All return types are documented at the top of the ac_fixed_math.hpp file. For example, for sin_fixed and cos_fixed, the input type is ac_fixed<W, 3, true>, and the output type is ac_fixed<W-1, 2, true>. You can also use the auto type to let the compiler use the default return type.

  • Accuracy

    • Floating point vs. Fixed point: The host program for this tutorial shows the accuracy differences between the result provided by floating point math library and the result provided by the ac_fixed math library functions, where the float version generates a more accurate result than the smaller-sized ac_fixed version.

      Note: The program is compiled with fp-model set to "precise" so that the accuracies of the floating-point math functions conform to the IEEE standard.

    • Emulation vs. FPGA Hardware for fixed point math operations: Due to the differences in the internal math implementations, the results from ac_fixed math functions in emulation and FPGA hardware might not always be bit-accurate. This tutorial shows how to build and run the sample for emulation and FPGA hardware so you can observe the difference.

Build the AC Fixed Tutorial

Note: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the setvars script in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.

Linux*:

  • For system wide installations: . /opt/intel/oneapi/setvars.sh
  • For private installations: . ~/intel/oneapi/setvars.sh
  • For non-POSIX shells, like csh, use the following command: bash -c 'source <install-dir>/setvars.sh ; exec csh'

Windows*:

  • C:\"Program Files (x86)"\Intel\oneAPI\setvars.bat
  • Windows PowerShell*, use the following command: cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'

For more information on configuring environment variables, see Use the setvars Script with Linux* or macOS* or Use the setvars Script with Windows*.

On Linux*

  1. Change to the sample directory.
  2. Build the program for Intel® Agilex® 7 device family, which is the default.
    mkdir build
    cd build
    cmake ..
    

    Note: You can change the default target by using the command:

    cmake .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
    

    Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:

    cmake .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
    

Note: You can poll your system for available BSPs using the aoc -list-boards command. The board list that is printed out will be of the form

$> aoc -list-boards
Board list:
  <board-variant>
     Board Package: <path/to/board/package>/board-support-package
  <board-variant2>
     Board Package: <path/to/board/package>/board-support-package

You will only be able to run an executable on the FPGA if you specified a BSP.

  1. Compile the design. (The provided targets match the recommended development flow.)

    1. Compile and run for emulation (fast compile time, targets emulates an FPGA device).
      make fpga_emu
      
    2. Generate the HTML optimization reports. (See Read the Reports below for information on finding and understanding the reports.)
      make report
      
    3. Compile for simulation (fast compile time, targets simulated FPGA device).
      make fpga_sim
      
    4. Compile and run on FPGA hardware (longer compile time, targets an FPGA device).
      make fpga
      

On Windows*

  1. Change to the sample directory.
  2. Build the program for the Intel® Agilex® 7 device family, which is the default.
    mkdir build
    cd build
    cmake -G "NMake Makefiles" ..
    

    Note: You can change the default target by using the command:

    cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
    

    Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:

    cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
    

Note: You can poll your system for available BSPs using the aoc -list-boards command. The board list that is printed out will be of the form

$> aoc -list-boards
Board list:
  <board-variant>
     Board Package: <path/to/board/package>/board-support-package
  <board-variant2>
     Board Package: <path/to/board/package>/board-support-package

You will only be able to run an executable on the FPGA if you specified a BSP.

  1. Compile the design. (The provided targets match the recommended development flow.)

    1. Compile for emulation (fast compile time, targets emulated FPGA device).
      nmake fpga_emu
      
    2. Generate the optimization report. (See Read the Reports below for information on finding and understanding the reports.)
      nmake report
      
    3. Compile for simulation (fast compile time, targets simulated FPGA device, reduced problem size).
      nmake fpga_sim
      
    4. Compile for FPGA hardware (longer compile time, targets FPGA device):
      nmake fpga
      

Note: If you encounter any issues with long paths when compiling under Windows*, you may have to create your 'build' directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory, for example:

C:\samples\build> cmake -G "NMake Makefiles" C:\long\path\to\code\sample\CMakeLists.txt

Read the Reports

Locate the pair of report.html files in either of the following folders.

  • Report-only compile: ac_fixed.report.prj
  • FPGA hardware compile: ac_fixed.prj

Scroll down on the Summary page of the report and expand the section titled Compile Estimated Kernel Resource Utilization Summary. Notice how the kernel ConstructFromACFixed consumes fewer resources than the kernel named ConstructFromFloat. Similarly, notice how the kernel named CalculateWithACFixed consumes fewer FPGA resources than CalculateWithFloat.

Run the AC Fixed Sample

On Linux

  1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
    ./ac_fixed.fpga_emu
    
  2. Run the sample of the FPGA simulator device (the kernel executes on the CPU).
    CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./ac_fixed.fpga_sim
    
  3. Run the sample on the FPGA device (only if you ran cmake with -DFPGA_DEVICE=<board-support-package>:<board-variant>).
    ./ac_fixed.fpga
    

On Windows

  1. Run the sample on the FPGA emulator (the kernel executes on the CPU).
    ac_fixed.fpga_emu.exe
  2. Run the sample of the FPGA simulator device (the kernel executes on the CPU).
    set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
    ac_fixed.fpga_sim.exe
    set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
    

Note: Hardware runs are not supported on Windows.

Example Output

The following is example output for the emulator.

1. Testing Constructing ac_fixed from float or ac_fixed:
Constructed from float:         3.6416015625
Constructed from ac_fixed:      3.6416015625

2. Testing calculation with float or ac_fixed math functions:
MAX DIFF (quantum) for ac_fixed<10, 3, true>:   0.0078125
MAX DIFF for float:                             9.53674e-07

Input 0:                        -0.80799192
result(fixed point):            1
difference(fixed point):        0
result(float):                  1
difference(float):              0

Input 1:                        -2.099829
result(fixed point):            0.9921875
difference(fixed point):        0.0078125
result(float):                  0.99999994
difference(float):              5.9604645e-08

Input 2:                        -0.74206626
result(fixed point):            1
difference(fixed point):        0
result(float):                  1
difference(float):              0

Input 3:                        -2.3321707
result(fixed point):            1
difference(fixed point):        0
result(float):                  1
difference(float):              0

Input 4:                        1.1432415
result(fixed point):            0.9921875
difference(fixed point):        0.0078125
result(float):                  0.99999994
difference(float):              5.9604645e-08

PASSED: all kernel results are correct.

Understand the Results

You can obtain a smaller hardware footprint for your kernel by ensuring that the ac_fixed numbers are constructed from float or double numbers outside the kernel. Additionally, by using the ac_fixed types and math library functions, you can use an even smaller fixed point format to trade off even more accuracy for a more resource efficient design if your application requirements allow for it.

License

Code samples are licensed under the MIT license. See License.txt for details.

Third-party program Licenses can be found here: third-party-programs.txt.