This sample is an FPGA tutorial that demonstrates how to specify the kernel invocation interface and kernel argument interface for an FPGA IP produced with the Intel® oneAPI DPC++/C++ Compiler.
Area | Description |
---|---|
What you will learn | Basics of specifying kernel invocation interfaces and kernel argument interfaces |
Time to complete | 30 minutes |
Category | Concepts and Functionality |
The sample demonstrates the differences between streaming invocation interfaces that use a ready/valid handshake and register-mapped invocation interfaces that exist in the control/status register (CSR) of FPGA IP produced with the Intel® oneAPI DPC++/C++ Compiler.
Use the get
kernel properties method to specify how the IP is started, and annotated_arg
wrapper to specify how arguments are passed to the IP.
Optimized for | Description |
---|---|
OS | Ubuntu* 20.04 RHEL*/CentOS* 8 SUSE* 15 Windows* 10, 11 Windows Server* 2019 |
Hardware | Intel® Agilex® 7, Agilex® 5, Arria® 10, Stratix® 10, and Cyclone® V FPGAs |
Software | Intel® oneAPI DPC++/C++ Compiler |
Note: Even though the Intel DPC++/C++ oneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
For using the simulator flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) and one of the following simulators must be installed and accessible through your PATH:
- Questa*-Intel® FPGA Edition
- Questa*-Intel® FPGA Starter Edition
- ModelSim® SE
When using the hardware compile flow, Intel® Quartus® Prime Pro Edition (or Standard Edition when targeting Cyclone® V) must be installed and accessible through your PATH.
Warning Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
This sample is part of the FPGA code samples. It is categorized as a Tier 2 sample that demonstrates a compiler feature.
flowchart LR
tier1("Tier 1: Get Started")
tier2("Tier 2: Explore the Fundamentals")
tier3("Tier 3: Explore the Advanced Techniques")
tier4("Tier 4: Explore the Reference Designs")
tier1 --> tier2 --> tier3 --> tier4
style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier2 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
Find more information about how to navigate this part of the code samples in the FPGA top-level README.md. You can also find more information about troubleshooting build errors, links to selected documentation, and more.
The sample demonstrates in detail how to declare kernel invocation interfaces and kernel argument interfaces.
The kernel invocation interface (namely, the start
and done
signals) can be implemented in the kernel's CSR, or using a ready/valid handshake. Similarly, the kernel arguments can be passed through the CSR, or through dedicated conduits.
Register-mapped Invocation with Register-mapped Arguments | Streaming Invocation with Conduit Arguments |
---|---|
The invocation interface and any argument interfaces are specified independently, so you may choose to implement the invocation interface with a ready/valid handshake, and implement the kernel arguments in the CSR. The following table lists valid kernel argument interface synchronizations.
Invocation Interface | Argument Interface | Argument Interface Synchronization |
---|---|---|
Streaming | Conduit | Consumed when <kernel_name>_streaming_start =1 and <kernel_name>_streaming_ready_out =0 |
Streaming | Register-mapped | Consumed if written one clock cycle before <kernel_name>_streaming_start =1 and <kernel_name>_streaming_ready_out =0 |
Register-mapped | Conduit | Consumed one clock cycle after writing to the start register |
Register-mapped | Register-mapped | Consumed if written any time before writing to the start register |
If you would like an argument to have its own dedicated ready/valid handshake, implement that argument using a streaming interface.
Note: The register-mapped and streaming interface features are only supported in the SYCL HLS flow. The SYCL HLS flow compiles SYCL* source code to IPs that can be deployed into your Intel® Quartus® Prime projects. Emulator and simulator executables are still generated to allow you to validate your IP. You can compile the generated RTL with Intel® Quartus® Prime to generate accurate fMAX and area estimates. However, the six
.fpga
executables generated in this tutorial are not designed to run on FPGA devices directly.
By default, your IP's start
and done
signals will appear in the IP's CSR. This is true whether you declare your kernel using the 'functor' or 'lambda' syntax.
struct MyIP {
...
void operator()() const {
...
}
};
...
q.single_task(MyIP{});
void myIPFunction() {
...
}
...
q.single_task([=] {
myIPFunction();
...
});
You can see concrete examples of kernels that use register-mapped invocation interfaces in src/reg_map_functor.cpp
and src/reg_map_lambda.cpp
You can force your IP's start
and done
signals to appear as signals on your IP boundary by adding the streaming_interface
kernel property.
Using the property sycl::ext::intel::experimental::streaming_interface<>
or sycl::ext::intel::experimental::streaming_interface_accept_downstream_stall
configures a streaming invocation interface with a ready_in
interface to allow down-stream components to backpressure. You can choose to remove the ready_in
interface by using sycl::ext::intel::experimental::streaming_interface<remove_downstream_stall>
or sycl::ext::intel::experimental::streaming_interface_remove_downstream_stall
instead. If you omit the streaming_interface
property, the compiler will configure your kernel with a register-mapped invocation interface. The syntax for declaring a kernel property is different depending on if you use the functor syntax or the lambda syntax, but the streaming_interface
property is the same.
If you declare your kernel using the functor syntax, you must declare your kernel properties in the get()
function, as shown here:
struct MyIP {
...
auto get(sycl::ext::oneapi::experimental::properties_tag) {
return sycl::ext::oneapi::experimental::properties {
sycl::ext::intel::experimental::streaming_interface<>
};
}
void operator()() const {
...
}
};
...
q.single_task(MyIP{});
src/stream_functor.cpp
and src/stream_lambda.cpp
demonstrate two different kernels that use a streaming invocation interface.
src/stream_rm_stall.cpp
demonstrates a kernel
that has a streaming invocation interface with the ready_in
signal disabled.
If you declare your kernel using the lambda syntax, you must declare your kernel properties in an object that you pass to the single_task()
function.
sycl::ext::oneapi::experimental::properties kernel_properties {
sycl::ext::intel::experimental::streaming_interface<>,
};
q.single_task(kernel_properties, [=] {
...
})
SYCL* task kernels are non-pipelined by default, meaning the next kernel invocation can only be started after the previous one has completed its execution. Kernels with a streaming kernel invocation interface can optionally be pipelined to increase the throughput of the kernel. A pipelined kernel is one that can be invoked while the previous kernel invocation is still executing, making full use of the entire hardware pipeline. The delay between successive invocations is called the initiation interval (II).
Non-pipelined Invocation | Pipelined Invocation with II=1 |
---|---|
![]() |
![]() |
The kernel property sycl::ext::intel::experimental::pipelined
takes an optional template parameter that controls whether to pipeline the kernel. Valid parameters are:
- -1: Pipeline the kernel, and automatically infer lowest possible II at target fMAX.
- 0: Do not pipeline the kernel.
- N (N> 0): Pipeline the kernel, and force the II of the kernel to be N.
If a parameter is not specified, the default parameter of -1
will be inferred, so the compiler will make its best effort to achieve the lowest kernel II.
Note: The
sycl::ext::intel::experimental::pipelined<>
property only supports kernels with a streaming invocation interface.
When you invoke a kernel with a pipelined streaming interface, you should only call the wait()
blocking function after all kernel invocations have launched.
for (int i = 0; i < count; i++) {
q.single_task<StreamPipelined>(StreamPipelinedIP{&input[i], &functor_streaming_pipelined_out[i]});
}
q.wait();
Note: As per the SYCL language semantics, separate invocations of a kernel are independent. This means that you can't make assumptions about memory ordering or memory dependences between kernel invocations. Make sure you use synchronization mechanisms such as the
.wait()
function or atomic operations to avoid race conditions.
If you want to guarantee sequential equivalence, you can also write your kernel with awhile(1)
loop in the kernel body instead of using a pipelined kernel.
In particular, a repeatedly-invoked kernel with a memory dependence will result in undefined behavior in SYCL and may not function as you expect.
For an example of a pipelined streaming kernel, see src/stream_pipelined.cpp
.
You can use the sycl::ext::oneapi::experimental::annotated_arg
wrapper type to customize whether a kernel argument should be passed to the IP component as a conduit, or through the CSR. If you do not specify an annotated_arg
wrapper, the compiler will infer an argument interface based on your invocation interface.
Invocation Interface | Automatically Inferred Argument Interface | SYCL* Property |
---|---|---|
Register-mapped | Register-mapped | sycl::ext::intel::experimental::register_map |
Streaming | Conduit | sycl::ext::intel::experimental::conduit |
You can add a conduit
property to an annotated_arg
like this:
struct MyIP {
sycl::ext::oneapi::experimental::annotated_arg<
int, decltype(sycl::ext::oneapi::experimental::properties {
sycl::ext::intel::experimental::conduit})>
arg1;
void operator()() const {
...
// access arg1 here
}
};
Note: If you use a
struct
orclass
type as the payload of anannotated_arg
, you must cast away theannotated_arg
wrapper to access the members and methods of yourstruct
orclass
. In particular, this applies toac_int
types. If you forget to cast away theannotated_arg
wrapper and try to access members or methods anyway, you will see a compiler error.using MyUInt5 = ac_int<5, false>; struct FunctorRegMapIP { int *input; int *output; // A kernel with a register map invocation interface can also independently // have streaming kernel arguments, when annotated by 'conduit' property. sycl::ext::oneapi::experimental::annotated_arg< MyUInt5, decltype(sycl::ext::oneapi::experimental::properties{ sycl::ext::intel::experimental::conduit})> n; // Without kernel invocation interface annotation, a register-mapped invocation // interface will be inferred by the compiler. void operator()() const { // For annotated_arg of ac_int type, explicitly cast away the annotated_arg // to prevent compiler error when using methods or accessing members. for (MyUInt5 i = 0; i < ((MyUInt5)n).slc<5>(0); i++) { output[i] = input[i] * (input[i] + 1); } } };This is demonstrated in
src/reg_map_functor.cpp
,src/reg_map_lambda.cpp
andsrc/stream_functor.cpp
.
This code sample contains 6 source files that together demonstrate a full spectrum of configuration options for IP component invocation interfaces.
-
src/reg_map_functor.cpp
- Register-mapped invocation interface
- Functor coding style
- Register-mapped argument (explicitly specified with
annotated_arg
) - Proper casting away of
annotated_arg
to access anac_int
method
-
src/reg_map_lambda.cpp
- Register-mapped invocation interface
- Lambda coding style
- Register-mapped argument (explicitly specified with
annotated_arg
) - Proper casting away of
annotated_arg
to access anac_int
method
-
src/stream_functor.cpp
- Streaming invocation interface (non-pipelined)
- Functor coding style
- Conduit argument (explicitly specified with
annotated_arg
) - Register-mapped argument (explicitly specified with
annotated_arg
) - Proper casting away of
annotated_arg
to access anstruct
member
-
src/stream_lambda.cpp
- Streaming invocation interface (non-pipelined)
- Lambda coding style
- Conduit argument (implied)
-
src/stream_pipelined.cpp
- Streaming invocation interface (pipelined)
- Functor coding style
- Conduit argument (implied)
-
src/stream_rm_stall.cpp
- Streaming invocation interface (non-pipelined, stall-free)
- Functor coding style
- Register-mapped argument (explicitly specified with
annotated_arg
) - Conduit argument (explicitly specified with
annotated_arg
)
Note: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the
setvars
script in the root of your oneAPI installation every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.Linux*:
- For system wide installations:
. /opt/intel/oneapi/setvars.sh
- For private installations:
. ~/intel/oneapi/setvars.sh
- For non-POSIX shells, like csh, use the following command:
bash -c 'source <install-dir>/setvars.sh ; exec csh'
Windows*:
C:\"Program Files (x86)"\Intel\oneAPI\setvars.bat
- Windows PowerShell*, use the following command:
cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'
For more information on configuring environment variables, see Use the setvars Script with Linux* or macOS* or Use the setvars Script with Windows*.
- Change to the sample directory.
- Build the program for Intel® Agilex® 7 device family, which is the default.
where
mkdir build cd build cmake .. -DPART=<X>
X
is:REG_MAP_FUNCTOR
STREAM_FUNCTOR
STREAM_RM_STALL
REG_MAP_LAMBDA
STREAM_LAMBDA
STREAM_PIPELINED
Note: You can change the default target by using the command:
cmake .. -DPART=<X> -DFPGA_DEVICE=<FPGA device family or FPGA part number>
Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
cmake .. -DPART=<X> -DFPGA_DEVICE=<board-support-package>:<board-variant>
Note: You can poll your system for available BSPs using the
aoc -list-boards
command. The board list that is printed out will be of the form$> aoc -list-boards Board list: <board-variant> Board Package: <path/to/board/package>/board-support-package <board-variant2> Board Package: <path/to/board/package>/board-support-package
You will only be able to run an executable on the FPGA if you specified a BSP.
-
Compile the design. (The provided targets match the recommended development flow.)
- Compile and run for emulation (fast compile time, targets emulates an FPGA device).
make fpga_emu
- Generate the HTML optimization reports. (See Read the Reports below for information on finding and understanding the reports.)
make report
- Compile for simulation (:warning: This design compiles many source files, so the simulation compilation may take up to 3 hours depending on your computer).
make fpga_sim
- Run the generated HDL through Intel® Quartus® Prime to generate accurate fMAX and area estimates.
Warning: The FPGA executables generated in this tutorial are not supported to be run on FPGA devices directly.
make fpga
- Compile and run for emulation (fast compile time, targets emulates an FPGA device).
- Change to the sample directory.
- Build the program for the Intel® Agilex® 7 device family, which is the default.
where
mkdir build cd build cmake -G "NMake Makefiles" .. -DPART=<X>
X
is:REG_MAP_FUNCTOR
STREAM_FUNCTOR
STREAM_RM_STALL
REG_MAP_LAMBDA
STREAM_LAMBDA
STREAM_PIPELINED
Note: You can change the default target by using the command:
cmake -G "NMake Makefiles" .. -DPART=<X> -DFPGA_DEVICE=<FPGA device family or FPGA part number>
Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
cmake -G "NMake Makefiles" .. -DPART=<X> -DFPGA_DEVICE=<board-support-package>:<board-variant>
Note: You can poll your system for available BSPs using the
aoc -list-boards
command. The board list that is printed out will be of the form$> aoc -list-boards Board list: <board-variant> Board Package: <path/to/board/package>/board-support-package <board-variant2> Board Package: <path/to/board/package>/board-support-package
You will only be able to run an executable on the FPGA if you specified a BSP.
-
Compile the design. (The provided targets match the recommended development flow.)
- Compile for emulation (fast compile time, targets emulated FPGA device).
nmake fpga_emu
- Generate the optimization report. (See Read the Reports below for information on finding and understanding the reports.)
nmake report
- Compile for simulation (fast compile time, targets simulated FPGA device, reduced problem size).
nmake fpga_sim
- Run the generated HDL through Intel® Quartus® Prime to generate accurate fMAX and area estimates.
Warning: The FPGA executables generated in this tutorial are not supported to be run on FPGA devices directly.
- Compile for emulation (fast compile time, targets emulated FPGA device).
Note: If you encounter any issues with long paths when compiling under Windows*, you may have to create your 'build' directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory, for example:
C:\samples\build> cmake -G "NMake Makefiles" C:\long\path\to\code\sample\CMakeLists.txt
-
Locate
report.html
in the corresponding<source_file>.report.prj/reports/
directory. -
Open the Views menu and select System Viewer.
In the left-hand pane, select FunctorRegMap or LambdaRegMap under the System hierarchy for the kernels with a register-mapped invocation interface.
In the main System Viewer pane, the kernel invocation interfaces and kernel arguments interfaces are shown. They show that the start
, busy
, and done
kernel invocation interfaces are implemented in register map interfaces, and the arg_input
and arg_output
kernel arguments are implemented in register map interfaces. The arg_n
kernel argument is implemented in a streaming interface in both the FunctorRegMap, and LambdaRegMap.
Similarly, in the left-hand pane, select FunctorStream, StreamRmStall, StreamPipelined or LambdaStream under the System hierarchy for the kernels with a streaming invocation interface.
In the main System Viewer pane, the kernel invocation interfaces and kernel arguments interfaces are shown. They show that the start
, done
, ready_in
, and ready_out
kernel invocation interfaces are implemented in streaming interfaces. The arg_input
kernel argument are implemented in streaming interfaces, arg_n
kernel argument are implemented in streaming interfaces except for StreamPipelined which does not have this argument input and arg_output
kernel argument are implemented in a register map interface in the FunctorStream and StreamRmStall, and in a streaming interface in the StreamPipelined and LambdaStream.
Note: Kernel invocation interfaces
ready_in
andready_out
are shown asstall_in
andstall_out
respectively.
Note: The report of StreamRmStall shows the internals of the kernel. Thus, there is a
stall_in
, but tied to ground and not seen at the device image boundary.
- Run the sample on the FPGA emulator (the kernel executes on the CPU).
./vector_add.fpga_emu
- Run the sample on the FPGA simulator.
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./vector_add.fpga_sim
- Run the sample on the FPGA emulator (the kernel executes on the CPU).
vector_add.fpga_emu.exe
- Run the sample on the FPGA simulator.
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 vector_add.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
Running the kernel with register map invocation interface implemented in the functor programming model
Done
PASSED
Running the kernel with streaming invocation interface implemented in the functor programming model
Done
PASSED
Running the kernel with streaming invocation interface implemented in the functor programming model
Done
PASSED
Launching streaming pipelined kernels consecutively
Done
PASSED
Running the kernel with register map invocation interface implemented in the lambda programming model
Done
PASSED
Running the kernel with streaming invocation interface implemented in the lambda programming model
Done
PASSED
The diagram below shows the example waveform generated by the simulator that you will see for the kernels with a register-mapped invocation interface. The waveform shows the register-mapped kernel arguments and kernel invocation handshaking signals are passed in through an Avalon agent interface, whose addresses are as specified in the agent memory map header files in the project directory.
The diagram below shows the example waveform generated by the simulator that you will see for the kernels with a streaming invocation interface. The waveform shows the streaming kernel arguments and kernel invocation handshaking signals follow the Avalon-ST protocol. The register map arguments need to be written before the start
signal is asserted. The streaming invocation interface consumes the streaming kernel arguments on the clock cycle that the start
and ready_out
signals are asserted, and the kernel invocation is finished on the clock cycle that the done
and ready_in
signals are asserted.
The diagram below shows the example waveform generated by the simulator that you will see for the pipelined kernel design. The waveform shows that the kernel is always ready, and 4 consecutive kernel invocations are launched with the start
signal being asserted for 4 consecutive clock cycles. When the 4 kernel execution finishes, the done
signal is asserted for 4 consecutive clock cycles.
Code samples are licensed under the MIT license. See License.txt for details.
Third party program Licenses can be found here: third-party-programs.txt.