Name	Name	Last commit message	Last commit date
parent directory ..
assets	assets
src	src
CMakeLists.txt	CMakeLists.txt
README.md	README.md
sample.json	sample.json

`annotated_ptr` Sample

This tutorial demonstrates how to use the annotated_ptr class to constrain memory accesses in your kernel code. This can help you build more efficient FPGA IP components with the Intel® oneAPI DPC++/C++ Compiler.

Optimized for	Description
OS	Ubuntu* 20.04 RHEL/CentOS 8 SUSE* 15 Windows* 10, 11 Windows Server* 2019
Hardware	Intel® Agilex® 7, Agilex® 5, Arria® 10, Stratix® 10, and Cyclone® V FPGAs
Software	Intel® oneAPI DPC++/C++ Compiler
What you will learn	Best practices for creating and managing a oneAPI FPGA project
Time to complete	15 minutes

Note: Even though the Intel DPC++/C++ oneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.

To use the simulator flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) and one of the following simulators must be installed and accessible through your PATH:

Questa*-Intel® FPGA Edition

Questa*-Intel® FPGA Starter Edition

ModelSim® SE

When using the hardware compile flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) must be installed and accessible through your PATH.

⚠️ Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.

Prerequisites

This sample is part of the FPGA code samples. It is categorized as a Tier 3 sample that demonstrates an advanced code optimization.

flowchart LR
   tier1("Tier 1: Get Started")
   tier2("Tier 2: Explore the Fundamentals")
   tier3("Tier 3: Explore the Advanced Techniques")
   tier4("Tier 4: Explore the Reference Designs")

   tier1 --> tier2 --> tier3 --> tier4

   style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
	style tier2 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
	style tier3 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
	style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff

Find more information about how to navigate this part of the code samples in the FPGA top-level README.md. You can also find more information about troubleshooting build errors, links to selected documentation, etc.

Purpose

The hls_flow_interfaces/mmhost code sample demonstrates how to use the annotated_arg wrapper class to customize an Avalon memory-mapped interface for an FPGA IP component.

A useful design optimization in FPGA SYCL code is to design with multiple buffer locations, (either multiple memory channels in a full SYCL system, or multiple memory-mapped host interfaces in a SYCL HLS IP). If your code contains un-annotated pointers (e.g. float *), the compiler will not know at compile time which buffer location to assign to load/store units (LSUs) associated with that pointer. This tutorial shows how to use the annotated_ptr class to constrain memory accesses to a pointer variable inside the kernel, which reduces the number of LSUs used in the generated FPGA IP component.

An FPGA Component That Contains Memory Access to an Ambiguous Buffer Location

In the example, the device code defines a SYCL kernel functor that computes the dot product between a weight matrix (located in buffer location 1) and a vector (located in buffer location 2), and saves to the result vector (located in buffer location 1).

The input and output vectors are kernel arguments passed by the host and annotated with annotated_arg:

struct DotProductIP {
  annotated_arg<float *, decltype(properties{buffer_location<kBL2>})> in_vec;
  annotated_arg<float *, decltype(properties{buffer_location<kBL1>})> out_vec;
  ...
};

The address to each row of the weight matrix is transferred into the kernel via a host pipe. The kernel reads the row pointers of the weight matrix and then performs the dot product operation.

using Pipe2DotProductIP = ext::intel::experimental::pipe<class MyPipeName1, float *>;
...
float *p = MyPipe::read();

float sum = 0.0f;
#pragma unroll COLS
for (int j = 0; j < COLS; j++)
   sum += p[j] * in_vec[j];
out_vec[i] = sum;

The global memory access of the kernel is distributed as follows

p[j]: buffer location is ambiguous because p is simply a float * without any annotations telling the compiler which buffer location it should be assigned to. So the compiler will generate load units connected to buffer location 1 and load units connected to buffer location 2. This is illustrated in the FPGA report, see Read the Reports below for more details.
in_vec[j]: The compiler knows this accesses buffer location 2 because in_vec is an annotated kernel argument.
out_vec[j]: The compiler knows this accesses buffer location 1 because out_vec is an annotated kernel argument.

Use the `annotated_ptr` Class to Constrain the Memory Access

You can provide the buffer location information of p to the compiler by wrapping it in an annotated_ptr, and then use the annotated_ptr local variable in the dot-product computation

annotated_ptr<float, decltype(properties{buffer_location<1>})> mat{p};

float sum = 0.0f;
#pragma unroll COLS
for (int j = 0; j < COLS; j++)
   sum += mat[j] * in_vec[j];
out_vec[i] = sum;

Now all the global memory accesses are assigned to a specific buffer location, including p, which is located in buffer location 1. This removes half of the load units connected to buffer location 2, saving significant FPGA resources.

Warning

The buffer location that is passed to annotated_ptr must be one of the buffer locations already assigned to global memory kernel arguments (in this case, buffer location kBL1 and kBL2).

Warning

The annotated_ptr class does not currently support the alignment property. Therefore, the consecutive memory accesses via the pointer mat in the unrolled loop cannot be configured to generate a statically coalesced load unit.

Building the `annotated_ptr` Tutorial

Note: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the setvars script located in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.

Linux*:

For system wide installations: . /opt/intel/oneapi/setvars.sh

For private installations: . ~/intel/oneapi/setvars.sh

For non-POSIX shells, like csh, use the following command: bash -c 'source <install-dir>/setvars.sh ; exec csh'

Windows*:

C:\Program Files(x86)\Intel\oneAPI\setvars.bat

Windows PowerShell*, use the following command: cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'

For more information on configuring environment variables, see Use the setvars Script with Linux* or macOS* or Use the setvars Script with Windows*.

Use these commands to run the design, depending on your OS.

On a Linux* System

This design uses CMake to generate a build script for GNU/make.

Change to the sample directory.
Configure the build system for the Agilex® 7 device family, which is the default.
```
mkdir build
cd build
cmake ..
```
Compile the design with the generated Makefile. The following build targets are provided, matching the recommended development flow:

Compilation Type Command

FPGA Emulator make fpga_emu

Optimization Report make report

FPGA Simulator make fpga_sim

FPGA Hardware make fpga

Compilation Type	Command
FPGA Emulator	`make fpga_emu`
Optimization Report	`make report`
FPGA Simulator	`make fpga_sim`
FPGA Hardware	`make fpga`

On a Windows* System

This design uses CMake to generate a build script for nmake.

Change to the sample directory.
Configure the build system for the Agilex® 7 device family, which is the default.
```
mkdir build
cd build
cmake -G "NMake Makefiles" ..
```
Compile the design with the generated Makefile. The following build targets are provided, matching the recommended development flow:

Compilation Type Command (Windows)

FPGA Emulator nmake fpga_emu

Optimization Report nmake report

FPGA Simulator nmake fpga_sim

FPGA Hardware nmake fpga

Compilation Type	Command (Windows)
FPGA Emulator	`nmake fpga_emu`
Optimization Report	`nmake report`
FPGA Simulator	`nmake fpga_sim`
FPGA Hardware	`nmake fpga`

Read the Reports

Build the report target and locate report.html in the annotated_ptr.report.prj/reports/ directory.

Navigate to System Viewer (Views > System Viewer) and click on the AnnotatedPtrIP kernel in the System hierarchy. Observe that the compiler generates a number of COLS LD nodes connected to global memory 1, which correspond to COLS times of read access over annotated_ptr mat in the inner loop of the computation.

By clicking on the DotProductIP kernel, you can verify that using the unannotated pointer p in the inner loop of dot product will result in an additional number of COLS LD nodes generated and connected to global memory 2, and thereby an increase in the Area Estimates tab.

Run the `annotated_ptr` Executable

On Linux

Run the sample on the FPGA emulator (the kernel executes on the CPU).
```
./annotated_ptr.fpga_emu
```

Run the sample on the FPGA simulator device.

CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./annotated_ptr.fpga_sim

On Windows

Run the sample on the FPGA emulator (the kernel executes on the CPU).
```
annotated_ptr.fpga_emu.exe
```

Run the sample on the FPGA simulator device.

set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
annotated_ptr.fpga_sim.exe
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=

Example Output

Running on device: Intel(R) FPGA Emulation Device
PASSED: The results are correct

License

Code samples are licensed under the MIT license. See License.txt for details.

Third party program Licenses can be found here: third-party-programs.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotated_ptr

annotated_ptr

README.md

`annotated_ptr` Sample

Prerequisites

Purpose

An FPGA Component That Contains Memory Access to an Ambiguous Buffer Location

Use the `annotated_ptr` Class to Constrain the Memory Access

Building the `annotated_ptr` Tutorial

On a Linux* System

On a Windows* System

Read the Reports

Run the `annotated_ptr` Executable

On Linux

On Windows

Example Output

License

Files

annotated_ptr

Directory actions

More options

Directory actions

More options

Latest commit

History

annotated_ptr

Folders and files

parent directory

README.md

annotated_ptr Sample

Prerequisites

Purpose

An FPGA Component That Contains Memory Access to an Ambiguous Buffer Location

Use the annotated_ptr Class to Constrain the Memory Access

Building the annotated_ptr Tutorial

On a Linux* System

On a Windows* System

Read the Reports

Run the annotated_ptr Executable

On Linux

On Windows

Example Output

License

`annotated_ptr` Sample

Use the `annotated_ptr` Class to Constrain the Memory Access

Building the `annotated_ptr` Tutorial

Run the `annotated_ptr` Executable