shader-slang
diff --git a/‎examples/neural_slang_demo/README.md‎
Lines changed: 300 additions & 0 deletions b/‎examples/neural_slang_demo/README.md‎
Lines changed: 300 additions & 0 deletions
@@ -0,0 +1,300 @@
+# Neural Demo - Using neural.slang
+
+This demo showcases how to use Slang's `neural.slang` standard module to build a neural network for image reconstruction. The network learns to map UV coordinates to RGB colors, reconstructing a reference image through gradient-based optimization.
+This is a re-creation of the texture example in the https://github.com/shader-slang/neural-shading-s25 course.
+
+## Overview
+
+The demo uses an MLP (Multi-Layer Perceptron) with the following architecture:
+- **Input**: 4 latent features sampled from a learnable texture
+- **Layer 0**: 4 → 32 neurons + LeakyReLU
+- **Layer 1**: 32 → 32 neurons + LeakyReLU
+- **Layer 2**: 32 → 3 neurons + Exp (for positive RGB output)
+
+## neural.slang Types Used
+
+| Type | Description |
+|------|-------------|
+| `InlineVector<T, N>` | Fixed-size vector type with compile-time `.Size` constant |
+| `StructuredBufferStorage<T>` | GPU buffer storage implementing `IStorage<T>` interface |
+| `FFLayer<T, InVec, OutVec, Storage, Activation, HasBias>` | Feed-forward neural network layer |
+| `IdentityActivation<T>` | Pass-through activation (no transformation) |
+| `NoParam()` | Empty parameter for activations that don't need configuration |
+
+## Before/After Comparison
+
+### Vector Types
+
+| Before (Manual) | After (neural.slang) |
+|-----------------|---------------------|
+| `float[4]` / `float4` | `InlineVector<float, 4>` |
+| `float[32]` | `InlineVector<float, 32>` |
+| `float[3]` / `float3` | `InlineVector<float, 3>` |
+| Manual size tracking | `Vec4.Size` compile-time constant |
+
+**Before:**
+```slang
+static const int INPUT_SIZE = 4;
+static const int HIDDEN_SIZE = 32;
+static const int OUTPUT_SIZE = 3;
+
+float[32] hidden;
+```
+
+**After:**
+```slang
+typealias Vec4 = InlineVector<float, 4>;
+typealias Vec32 = InlineVector<float, 32>;
+typealias Vec3 = InlineVector<float, 3>;
+
+static const int INPUT_SIZE = Vec4.Size;      // 4
+static const int HIDDEN_SIZE = Vec32.Size;    // 32
+static const int OUTPUT_SIZE = Vec3.Size;     // 3
+
+Vec32 hidden;
+```
+
+### Parameter Storage
+
+| Before (Manual) | After (neural.slang) |
+|-----------------|---------------------|
+| Separate weight/bias buffers | `StructuredBufferStorage<T>` wrapper |
+| Manual offset calculation | `Storage.getOffset()` method |
+| Manual parameter count | `FFLayer.ParameterCount` constant |
+
+**Before:**
+```slang
+struct Layer
+{
+    RWStructuredBuffer<float> weights;  // [out * in]
+    RWStructuredBuffer<float> biases;   // [out]
+
+    static const int PARAM_COUNT = 32 * 4 + 32;  // Manual calculation
+}
+```
+
+**After:**
+```slang
+typealias Storage = StructuredBufferStorage<float>;
+typealias Layer0Type = FFLayer<float, Vec4, Vec32, Storage, Act, true>;
+
+// Parameter count computed automatically from layer dimensions
+static const int LAYER0_PARAMS = Layer0Type.ParameterCount;  // 4*32 + 32 = 160
+
+struct MLPNetwork
+{
+    // Single buffer per layer: [weights row-major, biases]
+    RWStructuredBuffer<float> layer0_params;
+}
+```
+
+### Layer Forward Pass
+
+| Before (Manual) | After (neural.slang) |
+|-----------------|---------------------|
+| Manual matrix multiply | `FFLayer.eval()` using `linearTransform` |
+| Explicit loops | Optimized internal implementation |
+| Manual bias addition | Handled by `FFLayer` |
+
+**Before:**
+```slang
+[Differentiable]
+float[32] layer_forward(float[4] input)
+{
+    float[32] output;
+    for (int row = 0; row < 32; ++row)
+    {
+        float sum = biases[row];
+        for (int col = 0; col < 4; ++col)
+            sum += weights[row * 4 + col] * input[col];
+        output[row] = sum;
+    }
+    return output;
+}
+```
+
+**After:**
+```slang
+Vec3 forward(Vec4 input)
+{
+    // Create storage wrapper around buffer
+    let storage0 = Storage(layer0_params);
+
+    // Create FFLayer instance
+    // FFLayer(storage, weightAddress, biasAddress)
+    let ff0 = Layer0Type(storage0, 0u, INPUT_SIZE * HIDDEN_SIZE);
+
+    // Forward pass: y = W*x + b (linearTransform inside eval)
+    Vec32 h0 = ff0.eval(NoParam(), input);
+
+    // Apply activation...
+}
+```
+
+### Network Definition
+
+| Before (Manual) | After (neural.slang) |
+|-----------------|---------------------|
+| Custom struct with manual layout | Type aliases for layers |
+| Hardcoded dimensions | Dimensions from vector types |
+| Manual weight indexing | Automatic address calculation |
+
+**Before:**
+```slang
+struct Network
+{
+    RWStructuredBuffer<float> layer0_weights;  // 4*32 floats
+    RWStructuredBuffer<float> layer0_biases;   // 32 floats
+    RWStructuredBuffer<float> layer1_weights;  // 32*32 floats
+    RWStructuredBuffer<float> layer1_biases;   // 32 floats
+    RWStructuredBuffer<float> layer2_weights;  // 32*3 floats
+    RWStructuredBuffer<float> layer2_biases;   // 3 floats
+
+    [Differentiable]
+    float3 forward(float4 input) { /* manual implementation */ }
+}
+```
+
+**After:**
+```slang
+import neural;
+
+// Type definitions using neural.slang
+typealias Vec4 = InlineVector<float, 4>;
+typealias Vec32 = InlineVector<float, 32>;
+typealias Vec3 = InlineVector<float, 3>;
+typealias Storage = StructuredBufferStorage<float>;
+typealias Act = IdentityActivation<float>;
+
+typealias Layer0Type = FFLayer<float, Vec4, Vec32, Storage, Act, true>;
+typealias Layer1Type = FFLayer<float, Vec32, Vec32, Storage, Act, true>;
+typealias Layer2Type = FFLayer<float, Vec32, Vec3, Storage, Act, true>;
+
+struct MLPNetwork
+{
+    // One buffer per layer: [weights, biases] contiguous
+    RWStructuredBuffer<float> layer0_params;
+    RWStructuredBuffer<float> layer1_params;
+    RWStructuredBuffer<float> layer2_params;
+
+    Vec3 forward(Vec4 input)
+    {
+        let storage0 = Storage(layer0_params);
+        let ff0 = Layer0Type(storage0, 0u, INPUT_SIZE * HIDDEN_SIZE);
+        Vec32 h0 = ff0.eval(NoParam(), input);
+        // ...
+    }
+}
+```
+
+## Python-Side Parameter Management
+
+| Before (Manual) | After (neural.slang) |
+|-----------------|---------------------|
+| Separate numpy arrays | `FFLayerParams` class matching FFLayer layout |
+| Manual buffer creation | Automatic `[weights, biases]` concatenation |
+| Manual gradient tracking | Linked `TrainableLayerParams` |
+
+**Before:**
+```python
+class Layer:
+    def __init__(self, inputs, outputs):
+        self.weights = np.random.randn(outputs, inputs)
+        self.biases = np.zeros(outputs)
+        self.weights_buffer = create_buffer(self.weights)
+        self.biases_buffer = create_buffer(self.biases)
+```
+
+**After:**
+```python
+class FFLayerParams:
+    """Parameters matching FFLayer's expected buffer layout."""
+
+    def __init__(self, inputs: int, outputs: int):
+        # Xavier initialization
+        scale = np.sqrt(6.0 / (inputs + outputs))
+        self.weights_np = np.random.uniform(-scale, scale, (outputs, inputs))
+        self.biases_np = np.zeros(outputs)
+
+        # Create single buffer: [weights row-major, biases]
+        params = np.concatenate([self.weights_np.flatten(), self.biases_np])
+        self.buffer = create_buffer(params)
+```
+
+## Architecture Notes
+
+### Why Two Network Types?
+
+The demo uses two network structs:
+
+1. **`MLPNetwork`** (FFLayer-based) - For rendering
+   - Uses `FFLayer.eval()` with `StructuredBufferStorage`
+   - Forward-only evaluation (non-differentiable struct)
+   - Fast inference using optimized `linearTransform`
+
+2. **`TrainableMLPNetwork`** (Tensor-based) - For training
+   - Uses explicit weight/bias Tensors with gradient accumulation
+   - Implements the same `W*x + b` computation
+   - Gradients accumulate via `AtomicTensor`
+
+This separation is needed because `FFLayer.eval()` has `[NoDiffThis]` - gradients flow through the storage via `atomicAdd`, which requires a specific differential storage setup that's complex to wire up from Python. The Tensor-based approach gives us explicit control over gradient flow.
+
+After each optimization step, weights are synced from `TrainableLayerParams` back to `FFLayerParams.buffer`, so the FFLayer-based `MLPNetwork` renders with the updated weights.
+
+### InlineVector Subscript Limitation
+
+`InlineVector<T, N>` subscript operator (`operator[]`) doesn't have a backward derivative in the current implementation. This means:
+
+```slang
+// This would break gradient flow:
+Vec32 v;
+float x = v[0];  // No backward derivative for subscript!
+```
+
+**Workaround**: Convert between `InlineVector` and arrays using custom converters with explicit `[BackwardDerivative]`:
+
+```slang
+[BackwardDerivative(vec32ToArrBwd)]
+float[32] vec32ToArr(Vec32 v)
+{
+    float[32] a;
+    [ForceUnroll] for (int i = 0; i < 32; ++i) a[i] = v[i];
+    return a;
+}
+
+void vec32ToArrBwd(inout DifferentialPair<Vec32> dv, float[32] da)
+{
+    Vec32 d;
+    [ForceUnroll] for (int i = 0; i < 32; ++i) d[i] = da[i];
+    dv = diffPair(dv.p, d);
+}
+```
+
+## Running the Demo
+
+```bash
+cd slangpy-samples/examples/neural-demo
+python neural-demo.py
+```
+
+The demo displays three panels:
+1. **Reference image** - Target to reconstruct
+2. **Network output** - Current reconstruction using FFLayer-based network
+3. **Loss visualization** - Per-pixel error
+
+Loss values are printed to console and should decrease over time as the network learns.
+
+## Key Files
+
+- `neural-demo.slang` - Shader code with FFLayer types and network definitions
+- `neural-demo.py` - Python host code with parameter management and training loop
+- `slangstars.png` - Reference image to reconstruct
+
+## Dependencies
+
+Requires the `neural` module to compile:
+```slang
+import neural;  // Required! Demo won't compile without this
+```
+
+This import provides: `InlineVector`, `StructuredBufferStorage`, `FFLayer`, `IdentityActivation`, `NoParam`, and other neural network primitives.