Note
SYCL is used to write this kernel, which is not a common practice. Please also have a look at the OpenCL kernel examples like image_scale.
The same vector addition workload as the vadd example, but driven by an immediate command list and events instead of explicit command queues. This demonstrates fine-grained dependency tracking: memory copies signal events, and the kernel launch waits on those events before executing.
- Discovers a GPU device and prints its basic & compute properties
- Allocates host and device memory for two float32 vectors (256 MiB each)
- Fills both vectors with random values
- Loads a SPIR-V kernel (
vector_add) that computesa[i] += b[i]in parallel - Creates an event pool with 3 events to express data-flow dependencies
- Submits all work through a single immediate command list:
- Two H→D copies, each signaling its own event
- Kernel launch that waits on both copy events before executing
- D→H copy that waits on the kernel event
- Synchronizes via
HostSynchronizeon the immediate command list - Validates every element against the CPU reference
| Aspect | vadd |
vadd_event |
|---|---|---|
| Submission | 3 separate command lists executed on a command queue | 1 immediate command list |
| Synchronization | zeCommandQueueSynchronize |
zeCommandListHostSynchronize |
| Dependencies | Implicit via command list ordering + barriers | Explicit via events (wait lists) |
go run main.go=============== Device Basic Properties ===============
Running on device: ID = 32103 , Name = Intel(R) Graphics @ 0.00 GHz.
=============== Device Compute Properties ===============
Max Group Size (X, Y, Z): (1024, 1024, 1024)
Max Group Count (X, Y, Z): (4294967295, 4294967295, 4294967295)
Max Total Group Size: 1024
Max Shared Local Memory: 65536
Num Subgroup Sizes: 3
Subgroup Sizes: [8 16 32 0 0 0 0 0]
=============== Computation Configuration ===============
Group Size (X, Y, Z): (1024, 1, 1)
Group Count: 65536
Total Elements (N): 67108864
Buffer Size: 256 MiB
=============== Calculation Results ===============
GPU Execution Time: 51.768500 ms
GPU Throughput: 5.19 GiB/s
=============== Validation Results ===============
CPU Execution Time: 38.237400 ms
CPU Throughput: 7.02 GiB/s
Test Passed!!!