Open-source compute driver for RTX 40 series GPUs on macOS - Pure AI/ML power
AboutΒ Β Β |Β Β Β Quick StartΒ Β Β |Β Β Β FeaturesΒ Β Β |Β Β Β ArchitectureΒ Β Β |Β Β Β RoadmapΒ Β Β |Β Β Β ContributingΒ Β Β |Β Β Β License
NVDAAL (NVIDIA Ada Lovelace) is an open-source compute-only driver for NVIDIA RTX 40 series GPUs on macOS Hackintosh. This driver focuses exclusively on AI/ML workloads, leveraging the full compute power of Ada Lovelace architecture without display overhead.
This project is experimental and in early development. It requires:
- Deep understanding of GPU architecture
- Hackintosh environment with proper configuration
- GSP (GPU System Processor) firmware for full functionality
| Aspect | Benefit |
|---|---|
| Simplicity | No framebuffer, display engine, or video output code |
| Focus | 100% of GPU power dedicated to compute workloads |
| Viability | Based on proven TinyGPU implementation |
| Performance | Direct access to CUDA cores and Tensor cores |
| GPU | Device ID | CUDA Cores | Tensor Cores | Status |
|---|---|---|---|---|
| RTX 4090 | 0x2684 |
16,384 | 512 | π§ Development |
| RTX 4090 D | 0x2685 |
14,592 | 456 | β Planned |
| RTX 4080 Super | 0x2702 |
10,240 | 320 | β Planned |
| RTX 4080 | 0x2704 |
9,728 | 304 | β Planned |
| RTX 4070 Ti Super | 0x2705 |
8,448 | 264 | β Planned |
| RTX 4070 Ti | 0x2782 |
7,680 | 240 | β Planned |
| RTX 4070 Super | 0x2860 |
7,168 | 224 | β Planned |
| RTX 4070 | 0x2786 |
5,888 | 184 | β Planned |
- macOS Tahoe 26+ (via OpenCore 1.0.7+)
- Xcode Command Line Tools
- NVIDIA RTX 40 series GPU
- Boot args:
kext-dev-mode=1oramfi_get_out_of_my_way=0x1
# Download latest release from GitHub Releases
curl -LO https://github.com/gabrielmaialva33/NVDAAL-Driver/releases/latest/download/NVDAAL-Release-x86_64.zip
# Extract
unzip NVDAAL-Release-x86_64.zip
# Install kext
sudo cp -R NVDAAL.kext /Library/Extensions/
sudo kextutil /Library/Extensions/NVDAAL.kext# Clone the repository
git clone https://github.com/gabrielmaialva33/NVDAAL-Driver.git
cd NVDAAL-Driver
# Download GSP firmware
make download-firmware
# Build the kext + tools + library
make clean && make
# Validate structure
make test
# Load temporarily (for testing)
make load
# Check logs
make logs# Full boot with all firmwares (recommended)
nvdaal-cli boot Firmware/
# Legacy single-file load
nvdaal-cli load Firmware/gsp-570.144.binThe boot command expects these files in the firmware directory:
| File | Required | Purpose |
|---|---|---|
gsp-570.144.bin (or gsp.bin) |
Yes | GSP-RM firmware |
booter_load-ad102-570.144.bin |
No | SEC2 booter (Heavy-Secure) |
AD102.rom |
No | VBIOS for FWSEC-FRTS |
# Install to /Library/Extensions
make install
# Reboot required
sudo reboot- β PCI device detection and enumeration
- β BAR0/BAR1 memory mapping (MMIO + VRAM)
- β Chip identification (Ada Lovelace architecture)
- β
GSP Controller Implementation
- ELF Firmware Parser (non-contiguous 63MB support)
- Radix3 Page Table Builder (per-page physical addressing)
- WPR2 Metadata Configuration
- Complete VBIOS Parsing:
- BAR0 VBIOS reading (direct from GPU at 0x300000)
- ROM image scanning (0x55AA signatures)
- PCIR header parsing & FWSEC image detection (type 0xE0)
- BIT (BIOS Information Table) header scanning
- Ada Lovelace Token 0x50 PMU table path (with Token 0x70 fallback)
- PMU Lookup Table & Falcon Ucode Descriptor extraction
- FalconUcodeDescV3Nvidia parsing (pkcDataOffset, signatureCount, signatureVersions)
- Real FWSEC-FRTS Execution (matching NVIDIA open-gpu-kernel-modules):
- Falcon IMEM/DMEM ucode loading
- Fuse version reading (
readUcodeFuseVersion()) - RSA-3K signature patching (
patchFwsecSignature()) - FRTS command buffer patching (
patchFrtsCmdBuffer()) - DMEMMAPPER interface patching (FRTS command 0x15)
- GSP Falcon boot with timeout monitoring
- Enhanced Boot Sequence:
- SEC2 FALCON reset
- FWSEC-FRTS execution (WPR2 setup)
- booter_load on SEC2 (HS mode)
- RISC-V core start (correct 0x118000 base for Ada)
- Detailed error stages (
bootEx()) - Debug Mode: Continues boot even on FWSEC/booter failures
- Register Scanning: Auto-detect RISC-V base address
- β Full RPC Engine (rmAlloc, rmControl)
- β
Interrupt Driven Architecture
- MSI (Message Signaled Interrupts) support
- Reactive status queue processing
- β
Memory Management (MMU)
- Virtual Address Space (VASpace)
- Page Directory/Table management
- β
Compute Engine
- GPFIFO Channel creation
- User Doorbell mapping
- Command Submission
- β
User-Space Interface
- IOUserClient for secure firmware upload
- Zero-copy memory mapping
- libNVDAAL shared library
- Detailed error codes from kernel
- β
CLI Tool (nvdaal-cli)
bootcommand for full sequencefwseccommand for WPR2 configurationstatuscommand for GPU register statusloadcommand for legacy loading
- β
Multi-Architecture Build
- arm64 (Apple Silicon)
- x86_64 (Intel)
- π§ WPR2 Configuration (FWSEC-FRTS with proper signature patching)
- π§ Compute Class (ADA_COMPUTE_A) Context
- π§ Semaphore Synchronization
- β tinygrad/PyTorch integration
- β CUDA-like compute API
As of v0.6.1, NVDAAL is one of the first open-source efforts to bring Ada Lovelace compute to macOS. Key architectural decisions made for excellence:
- Lock-Free GSP RPC: Using synchronous memory barriers and stack-allocated buffers to minimize kernel latency during GPU resource management.
- Hardware-Native GPFIFO: Fully compliant with the 128-bit entry format required by AD10x chips, enabling direct hardware work submission.
- Dynamic MMU: Implements a real-time Bump Allocator for GPU Virtual Address Space, ensuring memory isolation and proper page alignment for Tensor core workloads.
- Complete Boot Pipeline: Full SEC2 + FWSEC + GSP-RM boot sequence matching NVIDIA's reference implementation, with detailed error stage reporting for debugging.
- Native VBIOS Parsing: Complete VBIOS ROM parsing including PCIR headers, BIT tables, PMU lookup, and Falcon ucode extraction for real FWSEC-FRTS execution.
- Non-Contiguous Memory: Handles 63MB GSP-RM firmware without requiring physically contiguous allocation, using per-page Radix3 table entries.
| Component | Status | Optimization |
|---|---|---|
| RPC Latency | π Low | Stack-based buffers |
| Memory Alloc | π High | Bump Allocator (Linear) |
| Submission | π High | Direct Doorbell (UserD) |
| Boot Diagnostics | π High | Error stage codes |
graph TB
subgraph "User Space"
CLI[nvdaal-cli]
PY[Python Scripts]
ML[tinygrad / PyTorch]
LIB[libNVDAAL.dylib]
end
subgraph "Kernel Space (NVDAAL.kext)"
UC[NVDAALUserClient]
DEV[NVDAAL IOService]
MEM[NVDAALMemory]
QUEUE[NVDAALQueue]
GSP[NVDAALGsp]
DISP[NVDAALDisplay]
end
subgraph "Hardware (RTX 4090)"
RISCV[GSP RISC-V Core]
SM[128 SMs / 16384 CUDA Cores]
TENSOR[512 Tensor Cores]
VRAM[24GB GDDR6X]
end
CLI --> LIB
PY --> LIB
ML --> LIB
LIB --> UC
UC --> DEV
DEV --> MEM
DEV --> QUEUE
DEV --> GSP
DEV --> DISP
GSP -->|RPC Protocol| RISCV
QUEUE -->|Compute Commands| SM
SM --> TENSOR
MEM -->|BAR1 Mapping| VRAM
sequenceDiagram
participant User as nvdaal-cli
participant Lib as libNVDAAL
participant Drv as NVDAAL.kext
participant GSP as NVDAALGsp
participant SEC2 as SEC2 Falcon
participant HW as GSP RISC-V
User->>Lib: boot(firmware_dir)
Note over Lib: Load VBIOS, booter_load, GSP-RM
Lib->>Drv: loadVbios(AD102.rom)
Lib->>Drv: loadBooterLoad(booter_load.bin)
Lib->>Drv: loadFirmware(gsp.bin)
Drv->>GSP: Initialize GSP
GSP->>GSP: Parse ELF (63MB, non-contiguous)
GSP->>GSP: Build Radix3 page tables
GSP->>HW: Reset GSP Falcon
GSP->>SEC2: Reset SEC2 Falcon
alt VBIOS loaded
GSP->>SEC2: Execute FWSEC-FRTS
SEC2-->>GSP: WPR2 region configured
else No VBIOS
GSP->>GSP: Check WPR2 (EFI may have set it)
end
GSP->>GSP: Setup WPR metadata
alt booter_load available
GSP->>SEC2: Execute booter_load (HS mode)
SEC2->>SEC2: Authenticate GSP-RM
SEC2-->>GSP: Boot handoff ready
end
GSP->>HW: Start RISC-V core
HW-->>GSP: GSP_INIT_DONE event
GSP->>GSP: Setup RPC queues
GSP-->>Drv: Ready (or error stage)
Drv-->>Lib: Success / Error code
Lib-->>User: Boot complete
graph LR
subgraph "BAR0 - MMIO (16MB)"
PMC[PMC Registers]
FALCON[GSP Falcon]
SEC2[SEC2 Falcon]
RISCV_CTRL[RISC-V Control]
GSP_QUEUE[GSP Queues]
end
subgraph "BAR1 - VRAM (24GB)"
USER_MEM[User Memory]
GSP_HEAP[GSP Heap<br/>129MB]
WPR2[WPR2 Region<br/>Protected by FWSEC]
FRTS[FRTS Scratch<br/>1MB]
end
subgraph "System RAM (DMA)"
CMD_Q[Command Queue<br/>256KB]
STAT_Q[Status Queue<br/>256KB]
FW_BUF[GSP-RM Firmware<br/>~63MB non-contiguous]
BOOTER[booter_load<br/>~1MB]
VBIOS[VBIOS/FWSEC<br/>~4MB]
RADIX3[Radix3 Page Tables]
end
PMC -.->|Control| USER_MEM
SEC2 -.->|Execute| BOOTER
FALCON -.->|Load| FW_BUF
GSP_QUEUE -.->|RPC| GSP_HEAP
graph TD
subgraph "NVDAAL.kext Components"
A[NVDAAL<br/>Main IOService] --> B[NVDAALGsp<br/>GSP Controller]
A --> C[NVDAALMemory<br/>VRAM Allocator]
A --> D[NVDAALQueue<br/>Command Queue]
A --> E[NVDAALDisplay<br/>Fake Display]
A --> F[NVDAALUserClient<br/>User Interface]
B --> |"parseElfFirmware()"| B1[ELF Parser]
B --> |"buildRadix3PageTable()"| B2[Page Tables]
B --> |"boot()"| B3[Boot Sequence]
B --> |"sendRpc()"| B4[RPC Protocol]
C --> |"allocVram()"| C1[Linear Allocator]
D --> |"push() / kick()"| D1[Ring Buffer]
end
| Phase | Description | Status |
|---|---|---|
| 1. Foundation | PCI detection, BAR mapping, chip ID | β Complete |
| 2. GSP Init | Firmware loading, RPC setup, boot sequence | β Complete |
| 3. User API | libNVDAAL, IOUserClient, CLI tool | β Complete |
| 4. Enhanced Boot | SEC2/FWSEC/WPR2, booter_load, error diagnostics | β Complete |
| 5. Memory | VRAM allocation, DMA buffers, virtual memory | π§ In Progress |
| 6. Compute | Queue management, command submission, sync | β Planned |
| 7. Integration | tinygrad, PyTorch backends | β Planned |
NVDAAL-Driver/
βββ Sources/ # Kernel extension source
β βββ NVDAAL.{h,cpp} # Main IOService driver
β βββ NVDAALGsp.{h,cpp} # GSP controller & RPC
β βββ NVDAALUserClient.{h,cpp} # User-space interface
β βββ NVDAALMemory.{h,cpp} # VRAM allocator
β βββ NVDAALQueue.{h,cpp} # Command queue
β βββ NVDAALDisplay.{h,cpp}# Fake display engine
β βββ NVDAALRegs.h # Register definitions
βββ Library/ # User-space SDK
β βββ libNVDAAL.{h,cpp} # C++ API wrapper
β βββ nvdaal_c_api.cpp # C FFI bindings
βββ Tools/
β βββ nvdaal-cli/ # CLI firmware loader
β βββ extract_vbios.py # VBIOS extraction
β βββ test_driver.py # Python test harness
βββ Docs/ # Technical documentation
β βββ ARCHITECTURE.md # Component details
β βββ GSP_INIT.md # GSP boot guide
β βββ TODO.md # Development checklist
βββ Firmware/ # User-provided firmware
βββ Info.plist # Kext configuration
βββ Makefile # Build system
βββ README.md
Contributions are welcome! Please read our Contributing Guide before submitting a PR.
make clean # Clean build artifacts
make # Build kext + tools + library
make rebuild # Clean + build
make test # Validate kext structure
make load # Load kext temporarily
make unload # Unload kext
make logs # View driver logs (last 5 min)
make logs-live # Stream logs in real-time
make status # Check kext and PCI status
make download-firmware # Download GSP firmware- TinyGPU/tinygrad - Primary reference for GSP
- NVIDIA open-gpu-kernel-modules - Official open-source drivers
- Nouveau Project - Linux open-source NVIDIA driver
- envytools - NVIDIA GPU documentation
This project is for educational and research purposes only. There is no guarantee of functionality. Use of proprietary firmware may violate NVIDIA's license terms. Use at your own risk.
This project is licensed under the MIT License - see the LICENSE file for details.
Made with π by Gabriel Maia