06 Mar 16:14

LostBeard

d87f699

SpawnDev.ILGPU v3.5.0 Latest

Latest

SpawnDev.ILGPU 3.5.0

Half (f16) Support

WebGPU f16 kernels — Float16 maps to native f16 in WGSL. Buffer alignment, constant emission, and Half ↔ float conversion intrinsics all wired up. Capability-gated on device feature support.
XMath.Min/Max/Clamp for Half — Added to XMath via float promotion.
Group Scan/Reduce for Half — ExclusiveScan, InclusiveScan, AllReduce, and GroupReduce now support Half on WebGPU and CUDA.
CUDA PTX Half warp shuffles — WarpShuffle, WarpShuffleDown, WarpShuffleUp, WarpShuffleXor (and SubWarp variants) for Half via b32 widening. Unlocks Half scan/reduce on CUDA.
Lock-free AllReduce — Rewrote AllReduce in both IL and PTX backends to use per-warp shared-memory slots instead of atomic operations. Removes the Half atomics dependency entirely and is correct for all types.
Half.One constant fix — Was 0x0001 (denormal ≈5.96e-8); corrected to 0x3C00 (IEEE-754 1.0).

WebGPU RadixSort with `double` / `long` Keys

RadixSortPairs<double, …> and RadixSortPairs<long, …> now work on WebGPU. Multiple root causes fixed end-to-end:
- FloatAsInt/IntAsFloat casts for emulated f64 now correctly reconstruct the IEEE-754 64-bit pattern.
- Structs containing emulated 64-bit fields are flattened to array<u32> in WGSL ("packed structs") to match CPU memory layout.
- True element count is passed to the GPU via a dedicated _scalar_params slot, replacing the incorrect arrayLength() calculation for packed views.
- Sub-view element offset is now computed in u32 units (padding / 4) instead of logical CPU elements, fixing sort correctness for array sizes where the inner temp allocation doesn't start at a 256-byte boundary.

Canvas Rendering (`ICanvasRenderer`)

ICanvasRenderer API — New interface for presenting ILGPU pixel buffers (MemoryBuffer2D<uint/int>, packed RGBA) directly to an HTML <canvas> element. Obtained via CanvasRendererFactory.Create(accelerator).
WebGPU — Zero-copy path: a cached WGSL fullscreen-triangle pipeline reads the pixel buffer directly from a read-only-storage binding. No CPU readback. Blit to the visible canvas via drawImage. Pipeline and bind-group are built once; uniforms only re-uploaded on resolution change.
WebGL — Delegates to an offscreen FBO blit in the GL Web Worker. Result is transferred as ImageBitmap back to the main thread, preventing Blazor's render cycle from clearing the canvas between frames.
CPU / Wasm — Fallback via putImageData. Browser-backed buffers use CopyToHostUint8ArrayAsync for a JS-side copy; pure CPU buffers fall back to synchronous CopyToCPU.

WebGPU Warp Reduce without Subgroups

GenerateWarpReduce now emits a full shared-memory butterfly reduction when the subgroups feature is unavailable, replacing the previous no-op passthrough. Correct results on hardware/drivers that don't expose subgroup extensions.

Algorithm Type Coverage

Added scan and reduce test/support variants for double, long, and uint:

Operation	New Types
`ExclusiveScan`	`double`, `uint`
`InclusiveScan`	`long`, `double`, `uint`
`AllReduce`	`double`, `long`, `uint`
`GroupReduce`	`float`, `long`, `double`, `uint`, `Half`

Assets 2

22 Feb 06:41

LostBeard

v3.3.0

c04e7d0

SpawnDev.ILGPU v3.3.0

SpawnDev.ILGPU v3.3.0 Release Notes

Desktop & Browser

WPF Demo Application — new desktop demo running the same shared kernels (Fractal Explorer, 3D Raymarching, GPU Boids) on CUDA, OpenCL, and CPU with live backend switching
Shared Kernel Library — extracted SpawnDev.ILGPU.Demo.Shared so browser and desktop demos share identical kernel code
Console Test Runner — added SpawnDev.ILGPU.ConsoleDemo for running the full unit test suite on desktop backends with process isolation for crash resilience
OpenCL 3.0 Compatibility — relaxed the GenericAddressSpace requirement, enabling NVIDIA GPUs with OpenCL 3.0 drivers that were previously blocked
Multi-platform support — updated SupportedPlatform to include Windows, Linux, and macOS

WebGL2 Backend — GPU-Resident Buffers

The WebGL2 backend has been refactored to eliminate unnecessary CPU↔GPU data transfers:

GPU-resident buffers — buffers persist as textures in the GL worker; kernel dispatch sends buffer references, not data
On-demand readback — CopyToHostAsync() is the only GPU→CPU transfer path
New worker protocol — allocBuffer, uploadBuffer, readbackBuffer, freeBuffer messages manage buffer lifecycle
Proper buffer disposal — buffers are freed in the worker when disposed on the C# side

Wasm Backend Improvements

Expanded API coverage including shared memory, barriers, dynamic shared memory, atomics, and broadcasting
Single-worker fallback mode when SharedArrayBuffer is unavailable

Transpiler Fixes

Break-PHI bug — fixed assignments before break in loops being dropped in WGSL and GLSL transpilers
CopySign — corrected argument swap in the CopySign intrinsic
64-bit reduce — fixed signed/unsigned mismatch in MinUInt64 and emu_f64 buffer I/O for AddDouble/MaxDouble
WebGL raymarching — fixed GLSL rendering issues
BVH ray traversal — corrected WebGPU and WebGL backend issues for complex scene traversal

Upstream ILGPU Fixes

Six bugs from the original ILGPU repo have been fixed in our fork:

Issue	Description	Severity
#1361	`MathF.CopySign` argument order swapped — silent wrong results on all GPU backends	High
#1309	`uint` to `float` cast routed through `double` — crashes on devices without fp64	Medium
#1479	Infinite compilation with large local arrays (`new int[1_000_000]`) — 10+ min, 10+ GB RAM	High
#1538	Internal Compiler Error with nested struct properties — wrong field slicing after type unification	Medium
#1539	OpenCL produces wrong results for complex kernels — stale phi variables persisted across blocks	High
#1540	H100/H200 not working — added SM_90, SM_100, SM_101, SM_120 architecture support	High

See upstream-issues.md for detailed root cause analysis and fix descriptions.

Documentation

Corrected synchronization semantics: Synchronize() = flush (non-blocking), SynchronizeAsync() = flush + wait, CopyToHostAsync() = only GPU→CPU path
Updated test count to 640 tests across 8 suites
Added WebGL GPU-resident buffer architecture documentation
Reduced default logging verbosity across all backends

Demo Improvements

Game of Life — fixed mouse interaction and added NavMenu icon
Fractal Explorer — moved to shared kernel library, improved WebGL2 rendering pipeline
Reduced console log noise for cleaner browser dev tools experience

Full Changelog: v3.2.0...v3.3.0

Assets 2

21 Feb 14:14

LostBeard

v3.2.0

bdf22cb

SpawnDev.ILGPU v3.2.0

Cross-platform GPU compute from a single codebase — browser and desktop.

What's New

🖥️ Desktop Support Verified

SpawnDev.ILGPU now officially supports desktop/server environments (Console, WPF, ASP.NET) alongside Blazor WebAssembly
Same NuGet package provides browser backends (WebGPU, WebGL, Wasm) and native backends (Cuda, OpenCL, CPU)
SynchronizeAsync() and CopyToHostAsync() work everywhere — async in the browser, graceful sync fallback on desktop
New SpawnDev.ILGPU.ConsoleDemo project included as a working reference

🎮 New Demos

Game of Life — GPU-accelerated cellular automaton
Boids 3D — Flocking simulation on all backends
Compute 3D — 3D compute shader demo

🐛 Bug Fixes

Fixed 3 transpiler bugs found during Game of Life development
Fixed handling of Debug IL in WebGPU and WebGL transpilers
Updated Wasm backend intrinsics

📚 Comprehensive Documentation

New Docs/ folder with 8 markdown guides: Getting Started, Backends, Kernels, Memory & Buffers, Advanced Patterns (GPU intrinsics, device sharing, rendering), Limitations, and API Reference
Covers both Blazor WASM and desktop usage
Incorporates foundational ILGPU concepts adapted for the browser

Full Changelog

See README.md and Docs/ for complete documentation.

Assets 2

16 Feb 17:39

LostBeard

v3.0.0

6c5a0f8

SpawnDev.ILGPU v3.0.0

What's New

🚀 Next-Generation GPU Computing in Blazor Wasm — v3.0.0 brings major performance improvements, streamlined architecture, and enhanced compatibility. Run C# ILGPU kernels on WebGPU, WebGL, and native WebAssembly with automatic backend selection.

Key Features

Three Powerful Backends — WebGPU (modern GPU compute via WGSL), WebGL (universal GPU access via GLSL ES 3.0), and Wasm (native WebAssembly on Web Workers)
CPU Backend — Standard ILGPU CPU accelerator included for debugging and performance comparison
Universal GPU Access — WebGPU for cutting-edge browsers, WebGL for virtually every device
Intelligent Auto-Selection — CreatePreferredAcceleratorAsync() automatically picks the best available backend (WebGPU → WebGL → Wasm)
64-bit Computing — Full double and long support via optimized emulation on both GPU backends
Multi-Worker Dispatch — Wasm backend distributes work across all available CPU cores
Zero-Copy Shared Memory — SharedArrayBuffer support for efficient data sharing
Atomic Operations — Workgroup synchronization and atomic operations on WebGPU and Wasm backends
Production Ready — Comprehensive test suite, stable APIs, and real-world optimization

Built For

✨ Blazor WebAssembly — Run compute-intensive C# kernels in the browser
🎮 Game Development — GPU-accelerated physics, graphics, and AI
📊 Data Processing — High-performance number crunching without native compilation
🔬 Scientific Computing — GPGPU capabilities in pure managed code

Resources

Full Changelog: v2.1.0...v3.0.0

Assets 2

13 Feb 20:41

LostBeard

v2.1.0

4e1e8eb

SpawnDev.ILGPU v2.1.0

What's New

🖼️ New WebGL Backend — GPU-accelerated compute on virtually every modern browser and device. C# kernels are transpiled to GLSL ES 3.0 vertex shaders and executed via Transform Feedback, providing broad GPU access even where WebGPU isn't supported.

Highlights

Five backends — WebGPU, WebGL, Wasm, Workers, and CPU
Two GPU backends — WebGPU for cutting-edge browsers, WebGL for universal coverage
Auto-selection — CreatePreferredAcceleratorAsync() picks the best available backend (WebGPU → WebGL → Wasm → Workers → CPU)
64-bit emulation on both GPU backends (double/long support via software emulation)
Benchmarks page — New interactive benchmark suite comparing throughput across all backends
Workers performance — Cached compiled functions and script bodies to reduce per-dispatch overhead

Links

Full Changelog: v2.0.0...v2.1.0

Assets 2

09 Feb 23:23

LostBeard

v2.0.0

3e793df

SpawnDev.ILGPU v2.0.0

SpawnDev.ILGPU v2.0.0 — First Stable Release

Run ILGPU kernels in the browser — on the GPU, across threads, or on the CPU.

SpawnDev.ILGPU v2.0.0 is the first stable release of this library, the successor to SpawnDev.ILGPU.WebGPU which only supported a single WebGPU backend. Version 2.0.0 brings four full compute backends, automatic device selection, and 360+ tests — all running entirely in the browser via Blazor WebAssembly.

What's New in 2.0.0

Four Compute Backends

Backend	Executes on	Performance
WebGPU	GPU via WGSL transpilation	⚡⚡⚡ Fastest
Wasm	Web Workers via native WebAssembly binary	⚡⚡ Fast
Workers	Web Workers via JavaScript transpilation	⚡ Moderate
CPU	Main thread via .NET runtime	🐢 Fallback

Automatic Backend Selection

Call CreatePreferredAcceleratorAsync() and the library picks the best available backend: WebGPU → Wasm → Workers → CPU.

Key Features

WGSL transpilation — C# ILGPU kernels compiled to WebGPU Shading Language for GPU execution
Wasm compilation — Kernels compiled to native WebAssembly binary modules for near-native performance
64-bit emulation — Full double (f64) and long (i64) support via software emulation on WebGPU
WebGPU extension auto-detection — Probes adapter for shader-f16, subgroups, timestamp-query and enables them automatically
Subgroup operations — Group.Broadcast and Warp.Shuffle supported when the browser exposes the subgroups extension
Multi-worker dispatch — Wasm and Workers backends distribute work across all available CPU cores
Shared memory & atomics — Workgroup memory, barriers, and atomic operations across backends
No native dependencies — Pure C#, powered by SpawnDev.BlazorJS

360+ Tests

Comprehensive coverage across all backends: memory, indexing, arithmetic, bitwise, math functions, atomics, control flow, structs, type casting, 64-bit emulation, GPU patterns, shared memory, broadcast & subgroups, and more.

Interactive Demo

Try the live demo featuring a real-time Fractal Explorer that lets you switch between all four backends and compare performance.

Installation

dotnet add package SpawnDev.ILGPU

Breaking Changes from SpawnDev.ILGPU.WebGPU

This package replaces SpawnDev.ILGPU.WebGPU. Key differences:

Namespace: SpawnDev.ILGPU (was SpawnDev.ILGPU.WebGPU)
Multiple backends: WebGPU is no longer the only option — Wasm, Workers, and CPU backends are included
Unified API: Context.CreateAsync() with builder pattern for all backends

Assets 2

Uh oh!

Releases: LostBeard/SpawnDev.ILGPU

SpawnDev.ILGPU v3.5.0

SpawnDev.ILGPU 3.5.0

Half (f16) Support

WebGPU RadixSort with double / long Keys

Canvas Rendering (ICanvasRenderer)

WebGPU Warp Reduce without Subgroups

Algorithm Type Coverage

Uh oh!