Releases: LostBeard/SpawnDev.ILGPU
SpawnDev.ILGPU v3.5.0
SpawnDev.ILGPU 3.5.0
Half (f16) Support
- WebGPU f16 kernels —
Float16maps to nativef16in WGSL. Buffer alignment, constant emission, andHalf ↔ floatconversion intrinsics all wired up. Capability-gated on device feature support. XMath.Min/Max/ClampforHalf— Added toXMathvia float promotion.- Group Scan/Reduce for
Half—ExclusiveScan,InclusiveScan,AllReduce, andGroupReducenow supportHalfon WebGPU and CUDA. - CUDA PTX Half warp shuffles —
WarpShuffle,WarpShuffleDown,WarpShuffleUp,WarpShuffleXor(and SubWarp variants) forHalfviab32widening. Unlocks Half scan/reduce on CUDA. - Lock-free
AllReduce— RewroteAllReducein both IL and PTX backends to use per-warp shared-memory slots instead of atomic operations. Removes the Half atomics dependency entirely and is correct for all types. Half.Oneconstant fix — Was0x0001(denormal ≈5.96e-8); corrected to0x3C00(IEEE-7541.0).
WebGPU RadixSort with double / long Keys
RadixSortPairs<double, …>andRadixSortPairs<long, …>now work on WebGPU. Multiple root causes fixed end-to-end:FloatAsInt/IntAsFloatcasts for emulatedf64now correctly reconstruct the IEEE-754 64-bit pattern.- Structs containing emulated 64-bit fields are flattened to
array<u32>in WGSL ("packed structs") to match CPU memory layout. - True element count is passed to the GPU via a dedicated
_scalar_paramsslot, replacing the incorrectarrayLength()calculation for packed views. - Sub-view element offset is now computed in u32 units (
padding / 4) instead of logical CPU elements, fixing sort correctness for array sizes where the inner temp allocation doesn't start at a 256-byte boundary.
Canvas Rendering (ICanvasRenderer)
ICanvasRendererAPI — New interface for presenting ILGPU pixel buffers (MemoryBuffer2D<uint/int>, packed RGBA) directly to an HTML<canvas>element. Obtained viaCanvasRendererFactory.Create(accelerator).- WebGPU — Zero-copy path: a cached WGSL fullscreen-triangle pipeline reads the pixel buffer directly from a
read-only-storagebinding. No CPU readback. Blit to the visible canvas viadrawImage. Pipeline and bind-group are built once; uniforms only re-uploaded on resolution change. - WebGL — Delegates to an offscreen FBO blit in the GL Web Worker. Result is transferred as
ImageBitmapback to the main thread, preventing Blazor's render cycle from clearing the canvas between frames. - CPU / Wasm — Fallback via
putImageData. Browser-backed buffers useCopyToHostUint8ArrayAsyncfor a JS-side copy; pure CPU buffers fall back to synchronousCopyToCPU.
WebGPU Warp Reduce without Subgroups
GenerateWarpReducenow emits a full shared-memory butterfly reduction when thesubgroupsfeature is unavailable, replacing the previous no-op passthrough. Correct results on hardware/drivers that don't expose subgroup extensions.
Algorithm Type Coverage
Added scan and reduce test/support variants for double, long, and uint:
| Operation | New Types |
|---|---|
ExclusiveScan |
double, uint |
InclusiveScan |
long, double, uint |
AllReduce |
double, long, uint |
GroupReduce |
float, long, double, uint, Half |
SpawnDev.ILGPU v3.3.0
SpawnDev.ILGPU v3.3.0 Release Notes
Desktop & Browser
- WPF Demo Application — new desktop demo running the same shared kernels (Fractal Explorer, 3D Raymarching, GPU Boids) on CUDA, OpenCL, and CPU with live backend switching
- Shared Kernel Library — extracted
SpawnDev.ILGPU.Demo.Sharedso browser and desktop demos share identical kernel code - Console Test Runner — added
SpawnDev.ILGPU.ConsoleDemofor running the full unit test suite on desktop backends with process isolation for crash resilience - OpenCL 3.0 Compatibility — relaxed the
GenericAddressSpacerequirement, enabling NVIDIA GPUs with OpenCL 3.0 drivers that were previously blocked - Multi-platform support — updated
SupportedPlatformto include Windows, Linux, and macOS
WebGL2 Backend — GPU-Resident Buffers
The WebGL2 backend has been refactored to eliminate unnecessary CPU↔GPU data transfers:
- GPU-resident buffers — buffers persist as textures in the GL worker; kernel dispatch sends buffer references, not data
- On-demand readback —
CopyToHostAsync()is the only GPU→CPU transfer path - New worker protocol —
allocBuffer,uploadBuffer,readbackBuffer,freeBuffermessages manage buffer lifecycle - Proper buffer disposal — buffers are freed in the worker when disposed on the C# side
Wasm Backend Improvements
- Expanded API coverage including shared memory, barriers, dynamic shared memory, atomics, and broadcasting
- Single-worker fallback mode when
SharedArrayBufferis unavailable
Transpiler Fixes
- Break-PHI bug — fixed assignments before
breakin loops being dropped in WGSL and GLSL transpilers - CopySign — corrected argument swap in the
CopySignintrinsic - 64-bit reduce — fixed signed/unsigned mismatch in
MinUInt64andemu_f64buffer I/O forAddDouble/MaxDouble - WebGL raymarching — fixed GLSL rendering issues
- BVH ray traversal — corrected WebGPU and WebGL backend issues for complex scene traversal
Upstream ILGPU Fixes
Six bugs from the original ILGPU repo have been fixed in our fork:
| Issue | Description | Severity |
|---|---|---|
| #1361 | MathF.CopySign argument order swapped — silent wrong results on all GPU backends |
High |
| #1309 | uint to float cast routed through double — crashes on devices without fp64 |
Medium |
| #1479 | Infinite compilation with large local arrays (new int[1_000_000]) — 10+ min, 10+ GB RAM |
High |
| #1538 | Internal Compiler Error with nested struct properties — wrong field slicing after type unification | Medium |
| #1539 | OpenCL produces wrong results for complex kernels — stale phi variables persisted across blocks | High |
| #1540 | H100/H200 not working — added SM_90, SM_100, SM_101, SM_120 architecture support | High |
See upstream-issues.md for detailed root cause analysis and fix descriptions.
Documentation
- Corrected synchronization semantics:
Synchronize()= flush (non-blocking),SynchronizeAsync()= flush + wait,CopyToHostAsync()= only GPU→CPU path - Updated test count to 640 tests across 8 suites
- Added WebGL GPU-resident buffer architecture documentation
- Reduced default logging verbosity across all backends
Demo Improvements
- Game of Life — fixed mouse interaction and added NavMenu icon
- Fractal Explorer — moved to shared kernel library, improved WebGL2 rendering pipeline
- Reduced console log noise for cleaner browser dev tools experience
Full Changelog: v3.2.0...v3.3.0
SpawnDev.ILGPU v3.2.0
SpawnDev.ILGPU v3.2.0
Cross-platform GPU compute from a single codebase — browser and desktop.
What's New
🖥️ Desktop Support Verified
- SpawnDev.ILGPU now officially supports desktop/server environments (Console, WPF, ASP.NET) alongside Blazor WebAssembly
- Same NuGet package provides browser backends (WebGPU, WebGL, Wasm) and native backends (Cuda, OpenCL, CPU)
SynchronizeAsync()andCopyToHostAsync()work everywhere — async in the browser, graceful sync fallback on desktop- New
SpawnDev.ILGPU.ConsoleDemoproject included as a working reference
🎮 New Demos
- Game of Life — GPU-accelerated cellular automaton
- Boids 3D — Flocking simulation on all backends
- Compute 3D — 3D compute shader demo
🐛 Bug Fixes
- Fixed 3 transpiler bugs found during Game of Life development
- Fixed handling of Debug IL in WebGPU and WebGL transpilers
- Updated Wasm backend intrinsics
📚 Comprehensive Documentation
- New
Docs/folder with 8 markdown guides: Getting Started, Backends, Kernels, Memory & Buffers, Advanced Patterns (GPU intrinsics, device sharing, rendering), Limitations, and API Reference - Covers both Blazor WASM and desktop usage
- Incorporates foundational ILGPU concepts adapted for the browser
Full Changelog
SpawnDev.ILGPU v3.0.0
SpawnDev.ILGPU v3.0.0
What's New
🚀 Next-Generation GPU Computing in Blazor Wasm — v3.0.0 brings major performance improvements, streamlined architecture, and enhanced compatibility. Run C# ILGPU kernels on WebGPU, WebGL, and native WebAssembly with automatic backend selection.
Key Features
- Three Powerful Backends — WebGPU (modern GPU compute via WGSL), WebGL (universal GPU access via GLSL ES 3.0), and Wasm (native WebAssembly on Web Workers)
- CPU Backend — Standard ILGPU CPU accelerator included for debugging and performance comparison
- Universal GPU Access — WebGPU for cutting-edge browsers, WebGL for virtually every device
- Intelligent Auto-Selection —
CreatePreferredAcceleratorAsync()automatically picks the best available backend (WebGPU → WebGL → Wasm) - 64-bit Computing — Full
doubleandlongsupport via optimized emulation on both GPU backends - Multi-Worker Dispatch — Wasm backend distributes work across all available CPU cores
- Zero-Copy Shared Memory — SharedArrayBuffer support for efficient data sharing
- Atomic Operations — Workgroup synchronization and atomic operations on WebGPU and Wasm backends
- Production Ready — Comprehensive test suite, stable APIs, and real-world optimization
Built For
- ✨ Blazor WebAssembly — Run compute-intensive C# kernels in the browser
- 🎮 Game Development — GPU-accelerated physics, graphics, and AI
- 📊 Data Processing — High-performance number crunching without native compilation
- 🔬 Scientific Computing — GPGPU capabilities in pure managed code
Resources
Full Changelog: v2.1.0...v3.0.0
SpawnDev.ILGPU v2.1.0
SpawnDev.ILGPU v2.1.0
What's New
🖼️ New WebGL Backend — GPU-accelerated compute on virtually every modern browser and device. C# kernels are transpiled to GLSL ES 3.0 vertex shaders and executed via Transform Feedback, providing broad GPU access even where WebGPU isn't supported.
Highlights
- Five backends — WebGPU, WebGL, Wasm, Workers, and CPU
- Two GPU backends — WebGPU for cutting-edge browsers, WebGL for universal coverage
- Auto-selection —
CreatePreferredAcceleratorAsync()picks the best available backend (WebGPU → WebGL → Wasm → Workers → CPU) - 64-bit emulation on both GPU backends (
double/longsupport via software emulation) - Benchmarks page — New interactive benchmark suite comparing throughput across all backends
- Workers performance — Cached compiled functions and script bodies to reduce per-dispatch overhead
Links
Full Changelog: v2.0.0...v2.1.0
SpawnDev.ILGPU v2.0.0
SpawnDev.ILGPU v2.0.0 — First Stable Release
Run ILGPU kernels in the browser — on the GPU, across threads, or on the CPU.
SpawnDev.ILGPU v2.0.0 is the first stable release of this library, the successor to SpawnDev.ILGPU.WebGPU which only supported a single WebGPU backend. Version 2.0.0 brings four full compute backends, automatic device selection, and 360+ tests — all running entirely in the browser via Blazor WebAssembly.
What's New in 2.0.0
Four Compute Backends
| Backend | Executes on | Performance |
|---|---|---|
| WebGPU | GPU via WGSL transpilation | ⚡⚡⚡ Fastest |
| Wasm | Web Workers via native WebAssembly binary | ⚡⚡ Fast |
| Workers | Web Workers via JavaScript transpilation | ⚡ Moderate |
| CPU | Main thread via .NET runtime | 🐢 Fallback |
Automatic Backend Selection
Call CreatePreferredAcceleratorAsync() and the library picks the best available backend: WebGPU → Wasm → Workers → CPU.
Key Features
- WGSL transpilation — C# ILGPU kernels compiled to WebGPU Shading Language for GPU execution
- Wasm compilation — Kernels compiled to native WebAssembly binary modules for near-native performance
- 64-bit emulation — Full
double(f64) andlong(i64) support via software emulation on WebGPU - WebGPU extension auto-detection — Probes adapter for
shader-f16,subgroups,timestamp-queryand enables them automatically - Subgroup operations —
Group.BroadcastandWarp.Shufflesupported when the browser exposes thesubgroupsextension - Multi-worker dispatch — Wasm and Workers backends distribute work across all available CPU cores
- Shared memory & atomics — Workgroup memory, barriers, and atomic operations across backends
- No native dependencies — Pure C#, powered by SpawnDev.BlazorJS
360+ Tests
Comprehensive coverage across all backends: memory, indexing, arithmetic, bitwise, math functions, atomics, control flow, structs, type casting, 64-bit emulation, GPU patterns, shared memory, broadcast & subgroups, and more.
Interactive Demo
Try the live demo featuring a real-time Fractal Explorer that lets you switch between all four backends and compare performance.
Installation
dotnet add package SpawnDev.ILGPUBreaking Changes from SpawnDev.ILGPU.WebGPU
This package replaces SpawnDev.ILGPU.WebGPU. Key differences:
- Namespace:
SpawnDev.ILGPU(wasSpawnDev.ILGPU.WebGPU) - Multiple backends: WebGPU is no longer the only option — Wasm, Workers, and CPU backends are included
- Unified API:
Context.CreateAsync()with builder pattern for all backends