Skip to content

Releases: LostBeard/SpawnDev.ILGPU

SpawnDev.ILGPU v3.5.0

06 Mar 16:14

Choose a tag to compare

SpawnDev.ILGPU 3.5.0

Half (f16) Support

  • WebGPU f16 kernelsFloat16 maps to native f16 in WGSL. Buffer alignment, constant emission, and Half ↔ float conversion intrinsics all wired up. Capability-gated on device feature support.
  • XMath.Min/Max/Clamp for Half — Added to XMath via float promotion.
  • Group Scan/Reduce for HalfExclusiveScan, InclusiveScan, AllReduce, and GroupReduce now support Half on WebGPU and CUDA.
  • CUDA PTX Half warp shufflesWarpShuffle, WarpShuffleDown, WarpShuffleUp, WarpShuffleXor (and SubWarp variants) for Half via b32 widening. Unlocks Half scan/reduce on CUDA.
  • Lock-free AllReduce — Rewrote AllReduce in both IL and PTX backends to use per-warp shared-memory slots instead of atomic operations. Removes the Half atomics dependency entirely and is correct for all types.
  • Half.One constant fix — Was 0x0001 (denormal ≈5.96e-8); corrected to 0x3C00 (IEEE-754 1.0).

WebGPU RadixSort with double / long Keys

  • RadixSortPairs<double, …> and RadixSortPairs<long, …> now work on WebGPU. Multiple root causes fixed end-to-end:
    • FloatAsInt/IntAsFloat casts for emulated f64 now correctly reconstruct the IEEE-754 64-bit pattern.
    • Structs containing emulated 64-bit fields are flattened to array<u32> in WGSL ("packed structs") to match CPU memory layout.
    • True element count is passed to the GPU via a dedicated _scalar_params slot, replacing the incorrect arrayLength() calculation for packed views.
    • Sub-view element offset is now computed in u32 units (padding / 4) instead of logical CPU elements, fixing sort correctness for array sizes where the inner temp allocation doesn't start at a 256-byte boundary.

Canvas Rendering (ICanvasRenderer)

  • ICanvasRenderer API — New interface for presenting ILGPU pixel buffers (MemoryBuffer2D<uint/int>, packed RGBA) directly to an HTML <canvas> element. Obtained via CanvasRendererFactory.Create(accelerator).
  • WebGPU — Zero-copy path: a cached WGSL fullscreen-triangle pipeline reads the pixel buffer directly from a read-only-storage binding. No CPU readback. Blit to the visible canvas via drawImage. Pipeline and bind-group are built once; uniforms only re-uploaded on resolution change.
  • WebGL — Delegates to an offscreen FBO blit in the GL Web Worker. Result is transferred as ImageBitmap back to the main thread, preventing Blazor's render cycle from clearing the canvas between frames.
  • CPU / Wasm — Fallback via putImageData. Browser-backed buffers use CopyToHostUint8ArrayAsync for a JS-side copy; pure CPU buffers fall back to synchronous CopyToCPU.

WebGPU Warp Reduce without Subgroups

  • GenerateWarpReduce now emits a full shared-memory butterfly reduction when the subgroups feature is unavailable, replacing the previous no-op passthrough. Correct results on hardware/drivers that don't expose subgroup extensions.

Algorithm Type Coverage

Added scan and reduce test/support variants for double, long, and uint:

Operation New Types
ExclusiveScan double, uint
InclusiveScan long, double, uint
AllReduce double, long, uint
GroupReduce float, long, double, uint, Half

SpawnDev.ILGPU v3.3.0

22 Feb 06:41

Choose a tag to compare

SpawnDev.ILGPU v3.3.0 Release Notes

Desktop & Browser

  • WPF Demo Application — new desktop demo running the same shared kernels (Fractal Explorer, 3D Raymarching, GPU Boids) on CUDA, OpenCL, and CPU with live backend switching
  • Shared Kernel Library — extracted SpawnDev.ILGPU.Demo.Shared so browser and desktop demos share identical kernel code
  • Console Test Runner — added SpawnDev.ILGPU.ConsoleDemo for running the full unit test suite on desktop backends with process isolation for crash resilience
  • OpenCL 3.0 Compatibility — relaxed the GenericAddressSpace requirement, enabling NVIDIA GPUs with OpenCL 3.0 drivers that were previously blocked
  • Multi-platform support — updated SupportedPlatform to include Windows, Linux, and macOS

WebGL2 Backend — GPU-Resident Buffers

The WebGL2 backend has been refactored to eliminate unnecessary CPU↔GPU data transfers:

  • GPU-resident buffers — buffers persist as textures in the GL worker; kernel dispatch sends buffer references, not data
  • On-demand readbackCopyToHostAsync() is the only GPU→CPU transfer path
  • New worker protocolallocBuffer, uploadBuffer, readbackBuffer, freeBuffer messages manage buffer lifecycle
  • Proper buffer disposal — buffers are freed in the worker when disposed on the C# side

Wasm Backend Improvements

  • Expanded API coverage including shared memory, barriers, dynamic shared memory, atomics, and broadcasting
  • Single-worker fallback mode when SharedArrayBuffer is unavailable

Transpiler Fixes

  • Break-PHI bug — fixed assignments before break in loops being dropped in WGSL and GLSL transpilers
  • CopySign — corrected argument swap in the CopySign intrinsic
  • 64-bit reduce — fixed signed/unsigned mismatch in MinUInt64 and emu_f64 buffer I/O for AddDouble/MaxDouble
  • WebGL raymarching — fixed GLSL rendering issues
  • BVH ray traversal — corrected WebGPU and WebGL backend issues for complex scene traversal

Upstream ILGPU Fixes

Six bugs from the original ILGPU repo have been fixed in our fork:

Issue Description Severity
#1361 MathF.CopySign argument order swapped — silent wrong results on all GPU backends High
#1309 uint to float cast routed through double — crashes on devices without fp64 Medium
#1479 Infinite compilation with large local arrays (new int[1_000_000]) — 10+ min, 10+ GB RAM High
#1538 Internal Compiler Error with nested struct properties — wrong field slicing after type unification Medium
#1539 OpenCL produces wrong results for complex kernels — stale phi variables persisted across blocks High
#1540 H100/H200 not working — added SM_90, SM_100, SM_101, SM_120 architecture support High

See upstream-issues.md for detailed root cause analysis and fix descriptions.

Documentation

  • Corrected synchronization semantics: Synchronize() = flush (non-blocking), SynchronizeAsync() = flush + wait, CopyToHostAsync() = only GPU→CPU path
  • Updated test count to 640 tests across 8 suites
  • Added WebGL GPU-resident buffer architecture documentation
  • Reduced default logging verbosity across all backends

Demo Improvements

  • Game of Life — fixed mouse interaction and added NavMenu icon
  • Fractal Explorer — moved to shared kernel library, improved WebGL2 rendering pipeline
  • Reduced console log noise for cleaner browser dev tools experience

Full Changelog: v3.2.0...v3.3.0

SpawnDev.ILGPU v3.2.0

21 Feb 14:14

Choose a tag to compare

SpawnDev.ILGPU v3.2.0

Cross-platform GPU compute from a single codebase — browser and desktop.

What's New

🖥️ Desktop Support Verified

  • SpawnDev.ILGPU now officially supports desktop/server environments (Console, WPF, ASP.NET) alongside Blazor WebAssembly
  • Same NuGet package provides browser backends (WebGPU, WebGL, Wasm) and native backends (Cuda, OpenCL, CPU)
  • SynchronizeAsync() and CopyToHostAsync() work everywhere — async in the browser, graceful sync fallback on desktop
  • New SpawnDev.ILGPU.ConsoleDemo project included as a working reference

🎮 New Demos

  • Game of Life — GPU-accelerated cellular automaton
  • Boids 3D — Flocking simulation on all backends
  • Compute 3D — 3D compute shader demo

🐛 Bug Fixes

  • Fixed 3 transpiler bugs found during Game of Life development
  • Fixed handling of Debug IL in WebGPU and WebGL transpilers
  • Updated Wasm backend intrinsics

📚 Comprehensive Documentation

  • New Docs/ folder with 8 markdown guides: Getting Started, Backends, Kernels, Memory & Buffers, Advanced Patterns (GPU intrinsics, device sharing, rendering), Limitations, and API Reference
  • Covers both Blazor WASM and desktop usage
  • Incorporates foundational ILGPU concepts adapted for the browser

Full Changelog

See README.md and Docs/ for complete documentation.

SpawnDev.ILGPU v3.0.0

16 Feb 17:39

Choose a tag to compare

SpawnDev.ILGPU v3.0.0

What's New

🚀 Next-Generation GPU Computing in Blazor Wasm — v3.0.0 brings major performance improvements, streamlined architecture, and enhanced compatibility. Run C# ILGPU kernels on WebGPU, WebGL, and native WebAssembly with automatic backend selection.

Key Features

  • Three Powerful Backends — WebGPU (modern GPU compute via WGSL), WebGL (universal GPU access via GLSL ES 3.0), and Wasm (native WebAssembly on Web Workers)
  • CPU Backend — Standard ILGPU CPU accelerator included for debugging and performance comparison
  • Universal GPU Access — WebGPU for cutting-edge browsers, WebGL for virtually every device
  • Intelligent Auto-SelectionCreatePreferredAcceleratorAsync() automatically picks the best available backend (WebGPU → WebGL → Wasm)
  • 64-bit Computing — Full double and long support via optimized emulation on both GPU backends
  • Multi-Worker Dispatch — Wasm backend distributes work across all available CPU cores
  • Zero-Copy Shared Memory — SharedArrayBuffer support for efficient data sharing
  • Atomic Operations — Workgroup synchronization and atomic operations on WebGPU and Wasm backends
  • Production Ready — Comprehensive test suite, stable APIs, and real-world optimization

Built For

  • Blazor WebAssembly — Run compute-intensive C# kernels in the browser
  • 🎮 Game Development — GPU-accelerated physics, graphics, and AI
  • 📊 Data Processing — High-performance number crunching without native compilation
  • 🔬 Scientific Computing — GPGPU capabilities in pure managed code

Resources

Full Changelog: v2.1.0...v3.0.0

SpawnDev.ILGPU v2.1.0

13 Feb 20:41

Choose a tag to compare

SpawnDev.ILGPU v2.1.0

What's New

🖼️ New WebGL Backend — GPU-accelerated compute on virtually every modern browser and device. C# kernels are transpiled to GLSL ES 3.0 vertex shaders and executed via Transform Feedback, providing broad GPU access even where WebGPU isn't supported.

Highlights

  • Five backends — WebGPU, WebGL, Wasm, Workers, and CPU
  • Two GPU backends — WebGPU for cutting-edge browsers, WebGL for universal coverage
  • Auto-selectionCreatePreferredAcceleratorAsync() picks the best available backend (WebGPU → WebGL → Wasm → Workers → CPU)
  • 64-bit emulation on both GPU backends (double/long support via software emulation)
  • Benchmarks page — New interactive benchmark suite comparing throughput across all backends
  • Workers performance — Cached compiled functions and script bodies to reduce per-dispatch overhead

Links

Full Changelog: v2.0.0...v2.1.0

SpawnDev.ILGPU v2.0.0

09 Feb 23:23

Choose a tag to compare

SpawnDev.ILGPU v2.0.0 — First Stable Release

Run ILGPU kernels in the browser — on the GPU, across threads, or on the CPU.

SpawnDev.ILGPU v2.0.0 is the first stable release of this library, the successor to SpawnDev.ILGPU.WebGPU which only supported a single WebGPU backend. Version 2.0.0 brings four full compute backends, automatic device selection, and 360+ tests — all running entirely in the browser via Blazor WebAssembly.

What's New in 2.0.0

Four Compute Backends

Backend Executes on Performance
WebGPU GPU via WGSL transpilation ⚡⚡⚡ Fastest
Wasm Web Workers via native WebAssembly binary ⚡⚡ Fast
Workers Web Workers via JavaScript transpilation ⚡ Moderate
CPU Main thread via .NET runtime 🐢 Fallback

Automatic Backend Selection

Call CreatePreferredAcceleratorAsync() and the library picks the best available backend: WebGPU → Wasm → Workers → CPU.

Key Features

  • WGSL transpilation — C# ILGPU kernels compiled to WebGPU Shading Language for GPU execution
  • Wasm compilation — Kernels compiled to native WebAssembly binary modules for near-native performance
  • 64-bit emulation — Full double (f64) and long (i64) support via software emulation on WebGPU
  • WebGPU extension auto-detection — Probes adapter for shader-f16, subgroups, timestamp-query and enables them automatically
  • Subgroup operationsGroup.Broadcast and Warp.Shuffle supported when the browser exposes the subgroups extension
  • Multi-worker dispatch — Wasm and Workers backends distribute work across all available CPU cores
  • Shared memory & atomics — Workgroup memory, barriers, and atomic operations across backends
  • No native dependencies — Pure C#, powered by SpawnDev.BlazorJS

360+ Tests

Comprehensive coverage across all backends: memory, indexing, arithmetic, bitwise, math functions, atomics, control flow, structs, type casting, 64-bit emulation, GPU patterns, shared memory, broadcast & subgroups, and more.

Interactive Demo

Try the live demo featuring a real-time Fractal Explorer that lets you switch between all four backends and compare performance.

Installation

dotnet add package SpawnDev.ILGPU

Breaking Changes from SpawnDev.ILGPU.WebGPU

This package replaces SpawnDev.ILGPU.WebGPU. Key differences:

  • Namespace: SpawnDev.ILGPU (was SpawnDev.ILGPU.WebGPU)
  • Multiple backends: WebGPU is no longer the only option — Wasm, Workers, and CPU backends are included
  • Unified API: Context.CreateAsync() with builder pattern for all backends