Native PHP extension for GPU computing using NVIDIA CUDA
Under active development
- APIs are unstable and may change
- Not recommended for production environments
php-cuda-ext is a native PHP extension that enables GPU-accelerated numerical computing, machine learning, and data science workloads directly from PHP using NVIDIA CUDA.
The extension gives PHP developers first-class access to GPU computing, allowing applications written in PHP to operate on large-scale tensors, execute parallel numerical algorithms, and scale computational workloads beyond CPU limitations.
With php-cuda-ext, PHP is no longer restricted to orchestration or I/O-bound tasks — it becomes a viable environment for:
- Tensor-based computation
- Data science pipelines
- Machine learning primitives
- High-throughput numerical processing
- GPU-accelerated experimentation and research
All computations are executed natively on the GPU, without relying on external runtimes or language bridges.
- No Python dependency
- No bindings to TensorFlow, PyTorch, or similar frameworks
- Native PHP syntax and semantics
- Explicit control over GPU execution
- Emphasis on performance and transparency
Rather than prescribing a fixed machine learning abstraction,
php-cuda-ext focuses on providing the fundamental building blocks
required to implement ML and data science systems directly in PHP.
This approach favors flexibility, performance, and transparency over opinionated high-level APIs.
- NVIDIA GPU with CUDA capability
- NVIDIA Driver compatible with CUDA Toolkit
- CUDA Toolkit 11.x+ (12.x recommended)
- PHP 8.0+
- Linux (tested on Ubuntu / Debian-based systems)
gcc,g++,make,autoconf,phpize
Clone the repository:
git clone https://github.com/lcmialichi/php-cuda-ext.git
cd php-cuda-extCompile and install:
./compile.shThe script runs:
phpize./configuremakemake install
Notice: the script will register cuda extension automatically
Verify installation:
php -m | grep cudaCudaArray represents an n-dimensional array stored entirely in GPU memory.
- No implicit CPU ↔ GPU transfers
- Contiguous memory layout
- Supports broadcasting and element-wise operations
- Designed for chained expressions
use Cuda\CudaArray;
$a = CudaArray::ones([3, 3], dtype: 'float32');
$b = CudaArray::full([3, 3], 2.0); // default dtype = float32
$result = ($a * 2.0 + $b) ** 2;All operations above are executed on the GPU.
use Cuda\CudaArray;
// CPU → GPU
$ca = new CudaArray([[1, 2], [3, 4]]);
// GPU-only allocation
$ones = CudaArray::ones([1024, 1024]);
$zeros = CudaArray::zeros([512]);
// GPU → CPU
$data = $ca->toArray();
// GPU → Contiguous list
$host = $ca->toHost();
// Contiguous list → CPU memory
$host->toGpu();
// Contiguous list → PHP Array
$host->toArray();
// Save to file (PHP serialization)
file_put_contents('/data/array.ser', serialize($host));
// Load from file
$restored = unserialize(file_get_contents('/data/array.ser')); // Cuda\ContiguousArray
// Convert back to GPU when needed
$gpu_restored = $restored->toGpu(); // Cuda\CudaArray- add, subtract, multiply, divide, power
- exp, log, sqrt, abs
- sin, cos, tan
- sum(axis)
- min(axis)
- max(axis)
- prod(axis)
- argMax(axis)
- argMin(axis)
- reshape(shape)
- flatten()
- transpose(axes)
- concat(tensors, axis)
- float32, float64
- uint8, uint16, uint32, uint32
- int8, int16, int32, int64
- bool
Custom kernels are defined using PHP 8 Attributes and compiled to PTX at runtime.
use Cuda\Attr as Attr;
class Kernels
{
#[Attr\Kernel(name: 'v_add')]
public function vectorAdd(
#[Attr\TensorType] array $a,
#[Attr\TensorType] array $b,
#[Attr\TensorType] array &$c,
#[Attr\IntType] int $n
): void {
$idx = $cuda->globalIdx();
if ($idx < $n) {
$c[$idx] = $a[$idx] + $b[$idx];
}
}
}$compiler = new Cuda\Compiler();
$compiler->kernel([new Kernels(), 'vectorAdd']);
$module = $compiler->compile();
$module->initialize();
$n = 1_048_576;
$a = CudaArray::ones([$n]);
$b = CudaArray::full([$n], 5.0);
$c = CudaArray::zeros([$n]);
$module->launch(
'v_add',
args: [$a, $b, $c, $n],
config: [
'block' => [256, 1, 1],
'grid' => [(int)ceil($n / 256), 1, 1]
]
);$id = $module->launchAsync('v_add', args: [...]);
$module->sync();Multiple kernels can be queued and synchronized explicitly.
Documented examples are available in the /examples directory:
- Tensor creation and basic operations
- Broadcasting and shape manipulation
- Reductions
- Custom JIT kernels
- Asynchronous execution
- Numerical computing
- Image and signal processing
- Scientific simulations
- Experimental machine learning pipelines
- GPU-accelerated data processing in PHP
This project is licensed under the MIT License - see the LICENSE file for details.