Example applications for GPU-accelerated MLP inference and training with DirectX 12 LinAlg Matrix.
- Example 01: Texture Inference
- Example 02: Texture Training
- Example 03: Texture Compression with Input Encoding
- MLP Model Format
Loads a pre-trained MLP binary and reconstructs a texture on the GPU.
Workflow: Train & export (Python) → Load binary → GPU inference → Save PPM image
pip install torch numpy matplotlib
cd scripts/reference
python texture_training.py # defaults: 4 hidden layers, 64 neurons, leaky_reluTraining options
| Option | Default | Description |
|---|---|---|
--backbone-layers |
4 |
Number of hidden layers |
--hidden-dim |
64 |
Neurons per hidden layer |
--activation |
leaky_relu |
identity, sigmoid, tanh, relu, leaky_relu |
--texture-pattern |
checkerboard |
gradient, checkerboard, stripes, circle, perlin |
--epochs |
30 |
Training iterations |
--learning-rate |
0.0025 |
Optimizer learning rate |
--optimizer |
lion |
sgd, adam, lion |
--dtype |
float |
float, half |
--samples |
200000 |
Number of training samples |
--batch-size |
2000 |
Training batch size |
--no-display |
false |
Skip matplotlib display and exit automatically |
--seed |
987654321 |
Random seed |
--texture-width |
2048 |
Texture width |
--texture-height |
2048 |
Texture height |
cd build/example
Release/01-texture-inference.exe ../../scripts/reference/texture-mlp-data.bin output.ppm| Argument | Required | Default | Description |
|---|---|---|---|
mlp-binary |
Yes | — | Path to MLP binary file |
output |
No | mlp-inference-output.ppm |
Output PPM image |
--texture-width |
No | 4096 |
Image width |
--texture-height |
No | 4096 |
Image height |
--cpu |
No | false |
Use CPU reference ML operations instead of GPU |
--cpp-fallback |
No | false |
Use C++ fallback (mlp.hlsl compiled as C++) |
--software-linalg |
No | false |
Use software linear algebra instead of LinAlg Matrix |
--debug |
No | false |
Verbose output |
Trains an MLP entirely on the GPU to learn a 2D texture pattern, then reconstructs the texture using the trained model.
Workflow: Init weights → GPU mini-batch training (SGD/Adam/Lion) → Reconstruct texture → Save PPM image
cd build/example
Release/02-texture-training.exe --output-image training-result.ppm| Option | Default | Description |
|---|---|---|
--backbone-layers |
4 |
Number of hidden layers |
--hidden-dim |
64 |
Neurons per hidden layer |
--activation |
leaky_relu |
identity, sigmoid, tanh, relu, leaky_relu |
--bias / --no-bias |
true |
Enable/disable bias in MLP layers |
--epochs |
30 |
Training epochs |
--batch-size |
2000 |
Mini-batch size |
--learning-rate |
0.0025 |
Optimizer learning rate |
--optimizer |
lion |
sgd, adam, lion |
--samples |
200000 |
Number of training samples |
--texture-width |
2048 |
Texture width |
--texture-height |
2048 |
Texture height |
--texture-pattern |
checkerboard |
gradient, checkerboard, stripes, circle, perlin |
--output-image |
mlp-training-output.ppm |
Output PPM image |
--cpu |
false |
Use CPU reference ML operations instead of GPU |
--cpp-fallback |
false |
Use C++ fallback (mlp.hlsl compiled as C++) |
--software-linalg |
false |
Use software linear algebra instead of LinAlg Matrix |
--debug |
false |
Enable debug mode for detailed output |
--seed |
987654321 |
Random seed |
The training compute shaders (example/kernel/02_texture_training.comp) demonstrate:
- Using
mininn::TrainingLayerDataRefto bind weight, bias, gradient, and logits cache buffers - Calling
mininn::forward()followed bymininn::backward()for a full training step - GPU-side optimizer kernels (SGD, Adam, Lion) that read gradients and update weights directly on the GPU
- Shared optimizer implementations in
kernel/optimizer.hlslthat work on packed byte buffers
Trains an MLP on the GPU to map normalized UV coordinates (u, v) to RGB pixel values, optionally using positional or grid input encoding for higher-quality texture compression.
See 03_texture_compression_with_input_encoding/README.md for the full option list, input-encoding modes, and example recipes.
Binary format used by Example 01 (produced by the Python training script):
Header (12 bytes):
int32 numHiddenLayers
int32 hiddenLayerDim
int32 activationType (0=Identity, 1=Sigmoid, 2=Tanh, 3=ReLU, 4=LeakyReLU)
Per layer (numHiddenLayers + 1 layers):
float32[outputDim × inputDim] Weight matrix (row-major)
float32[outputDim] Bias vector
Layer ordering: Input(2)→Hidden₁→…→Hiddenₙ→Output(2). Hidden layers use the header's activation; the output layer always uses Sigmoid.
For HLSL API details, see MLP HLSL API Reference.
MIT License — see LICENSE.
Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved.