NVDAAL - NVIDIA Ada Lovelace Compute Driver

Open-source compute driver for RTX 40 series GPUs on macOS - Pure AI/ML power

🔖 About

NVDAAL (NVIDIA Ada Lovelace) is an open-source compute-only driver for NVIDIA RTX 40 series GPUs on macOS Hackintosh. This driver focuses exclusively on AI/ML workloads, leveraging the full compute power of Ada Lovelace architecture without display overhead.

⚠️ Important Notice

This project is experimental and in early development. It requires:

Deep understanding of GPU architecture
Hackintosh environment with proper configuration
GSP (GPU System Processor) firmware for full functionality

⚡ Why Compute-Only?

Aspect	Benefit
Simplicity	No framebuffer, display engine, or video output code
Focus	100% of GPU power dedicated to compute workloads
Viability	Based on proven TinyGPU implementation
Performance	Direct access to CUDA cores and Tensor cores

🖥️ Supported Hardware

GPU	Device ID	CUDA Cores	Tensor Cores	Status
RTX 4090	`0x2684`	16,384	512	🚧 Development
RTX 4090 D	`0x2685`	14,592	456	⌛ Planned
RTX 4080 Super	`0x2702`	10,240	320	⌛ Planned
RTX 4080	`0x2704`	9,728	304	⌛ Planned
RTX 4070 Ti Super	`0x2705`	8,448	264	⌛ Planned
RTX 4070 Ti	`0x2782`	7,680	240	⌛ Planned
RTX 4070 Super	`0x2860`	7,168	224	⌛ Planned
RTX 4070	`0x2786`	5,888	184	⌛ Planned

🚀 Quick Start

✔️ Prerequisites

macOS Tahoe 26+ (via OpenCore 1.0.7+)
Xcode Command Line Tools
NVIDIA RTX 40 series GPU
Boot args: kext-dev-mode=1 or amfi_get_out_of_my_way=0x1

⬇️ Installation

Option 1: Download Pre-built Release

# Download latest release from GitHub Releases
curl -LO https://github.com/gabrielmaialva33/NVDAAL-Driver/releases/latest/download/NVDAAL-Release-x86_64.zip

# Extract
unzip NVDAAL-Release-x86_64.zip

# Install kext
sudo cp -R NVDAAL.kext /Library/Extensions/
sudo kextutil /Library/Extensions/NVDAAL.kext

Option 2: Build from Source

# Clone the repository
git clone https://github.com/gabrielmaialva33/NVDAAL-Driver.git
cd NVDAAL-Driver

# Download GSP firmware
make download-firmware

# Build the kext + tools + library
make clean && make

# Validate structure
make test

# Load temporarily (for testing)
make load

# Check logs
make logs

⚡ Boot Sequence

# Full boot with all firmwares (recommended)
nvdaal-cli boot Firmware/

# Legacy single-file load
nvdaal-cli load Firmware/gsp-570.144.bin

The boot command expects these files in the firmware directory:

File	Required	Purpose
`gsp-570.144.bin` (or `gsp.bin`)	Yes	GSP-RM firmware
`booter_load-ad102-570.144.bin`	No	SEC2 booter (Heavy-Secure)
`AD102.rom`	No	VBIOS for FWSEC-FRTS

📦 Permanent Installation

# Install to /Library/Extensions
make install

# Reboot required
sudo reboot

🔧 Features

Current (v0.6.1-dev - RSA Signature Patching & WPR2 Configuration)

✅ PCI device detection and enumeration
✅ BAR0/BAR1 memory mapping (MMIO + VRAM)
✅ Chip identification (Ada Lovelace architecture)
✅ GSP Controller Implementation
- ELF Firmware Parser (non-contiguous 63MB support)
- Radix3 Page Table Builder (per-page physical addressing)
- WPR2 Metadata Configuration
- Complete VBIOS Parsing:
  - BAR0 VBIOS reading (direct from GPU at 0x300000)
  - ROM image scanning (0x55AA signatures)
  - PCIR header parsing & FWSEC image detection (type 0xE0)
  - BIT (BIOS Information Table) header scanning
  - Ada Lovelace Token 0x50 PMU table path (with Token 0x70 fallback)
  - PMU Lookup Table & Falcon Ucode Descriptor extraction
  - FalconUcodeDescV3Nvidia parsing (pkcDataOffset, signatureCount, signatureVersions)
- Real FWSEC-FRTS Execution (matching NVIDIA open-gpu-kernel-modules):
  - Falcon IMEM/DMEM ucode loading
  - Fuse version reading (readUcodeFuseVersion())
  - RSA-3K signature patching (patchFwsecSignature())
  - FRTS command buffer patching (patchFrtsCmdBuffer())
  - DMEMMAPPER interface patching (FRTS command 0x15)
  - GSP Falcon boot with timeout monitoring
- Enhanced Boot Sequence:
  - SEC2 FALCON reset
  - FWSEC-FRTS execution (WPR2 setup)
  - booter_load on SEC2 (HS mode)
  - RISC-V core start (correct 0x118000 base for Ada)
- Detailed error stages (bootEx())
- Debug Mode: Continues boot even on FWSEC/booter failures
- Register Scanning: Auto-detect RISC-V base address
✅ Full RPC Engine (rmAlloc, rmControl)
✅ Interrupt Driven Architecture
- MSI (Message Signaled Interrupts) support
- Reactive status queue processing
✅ Memory Management (MMU)
- Virtual Address Space (VASpace)
- Page Directory/Table management
✅ Compute Engine
- GPFIFO Channel creation
- User Doorbell mapping
- Command Submission
✅ User-Space Interface
- IOUserClient for secure firmware upload
- Zero-copy memory mapping
- libNVDAAL shared library
- Detailed error codes from kernel
✅ CLI Tool (nvdaal-cli)
- boot command for full sequence
- fwsec command for WPR2 configuration
- status command for GPU register status
- load command for legacy loading
✅ Multi-Architecture Build
- arm64 (Apple Silicon)
- x86_64 (Intel)

In Development

🚧 WPR2 Configuration (FWSEC-FRTS with proper signature patching)
🚧 Compute Class (ADA_COMPUTE_A) Context
🚧 Semaphore Synchronization

Planned

⌛ tinygrad/PyTorch integration
⌛ CUDA-like compute API

⭐ Pioneer Insights

As of v0.6.1, NVDAAL is one of the first open-source efforts to bring Ada Lovelace compute to macOS. Key architectural decisions made for excellence:

Lock-Free GSP RPC: Using synchronous memory barriers and stack-allocated buffers to minimize kernel latency during GPU resource management.
Hardware-Native GPFIFO: Fully compliant with the 128-bit entry format required by AD10x chips, enabling direct hardware work submission.
Dynamic MMU: Implements a real-time Bump Allocator for GPU Virtual Address Space, ensuring memory isolation and proper page alignment for Tensor core workloads.
Complete Boot Pipeline: Full SEC2 + FWSEC + GSP-RM boot sequence matching NVIDIA's reference implementation, with detailed error stage reporting for debugging.
Native VBIOS Parsing: Complete VBIOS ROM parsing including PCIR headers, BIT tables, PMU lookup, and Falcon ucode extraction for real FWSEC-FRTS execution.
Non-Contiguous Memory: Handles 63MB GSP-RM firmware without requiring physically contiguous allocation, using per-page Radix3 table entries.

📈 Performance Status

Component	Status	Optimization
RPC Latency	🔅 Low	Stack-based buffers
Memory Alloc	🔆 High	Bump Allocator (Linear)
Submission	🔆 High	Direct Doorbell (UserD)
Boot Diagnostics	🔆 High	Error stage codes

⚙️ Architecture

System Overview

graph TB
    subgraph "User Space"
        CLI[nvdaal-cli]
        PY[Python Scripts]
        ML[tinygrad / PyTorch]
        LIB[libNVDAAL.dylib]
    end

    subgraph "Kernel Space (NVDAAL.kext)"
        UC[NVDAALUserClient]
        DEV[NVDAAL IOService]
        MEM[NVDAALMemory]
        QUEUE[NVDAALQueue]
        GSP[NVDAALGsp]
        DISP[NVDAALDisplay]
    end

    subgraph "Hardware (RTX 4090)"
        RISCV[GSP RISC-V Core]
        SM[128 SMs / 16384 CUDA Cores]
        TENSOR[512 Tensor Cores]
        VRAM[24GB GDDR6X]
    end

    CLI --> LIB
    PY --> LIB
    ML --> LIB
    LIB --> UC
    UC --> DEV
    DEV --> MEM
    DEV --> QUEUE
    DEV --> GSP
    DEV --> DISP
    GSP -->|RPC Protocol| RISCV
    QUEUE -->|Compute Commands| SM
    SM --> TENSOR
    MEM -->|BAR1 Mapping| VRAM

GSP Boot Sequence

sequenceDiagram
    participant User as nvdaal-cli
    participant Lib as libNVDAAL
    participant Drv as NVDAAL.kext
    participant GSP as NVDAALGsp
    participant SEC2 as SEC2 Falcon
    participant HW as GSP RISC-V

    User->>Lib: boot(firmware_dir)
    Note over Lib: Load VBIOS, booter_load, GSP-RM

    Lib->>Drv: loadVbios(AD102.rom)
    Lib->>Drv: loadBooterLoad(booter_load.bin)
    Lib->>Drv: loadFirmware(gsp.bin)

    Drv->>GSP: Initialize GSP
    GSP->>GSP: Parse ELF (63MB, non-contiguous)
    GSP->>GSP: Build Radix3 page tables

    GSP->>HW: Reset GSP Falcon
    GSP->>SEC2: Reset SEC2 Falcon

    alt VBIOS loaded
        GSP->>SEC2: Execute FWSEC-FRTS
        SEC2-->>GSP: WPR2 region configured
    else No VBIOS
        GSP->>GSP: Check WPR2 (EFI may have set it)
    end

    GSP->>GSP: Setup WPR metadata

    alt booter_load available
        GSP->>SEC2: Execute booter_load (HS mode)
        SEC2->>SEC2: Authenticate GSP-RM
        SEC2-->>GSP: Boot handoff ready
    end

    GSP->>HW: Start RISC-V core
    HW-->>GSP: GSP_INIT_DONE event
    GSP->>GSP: Setup RPC queues
    GSP-->>Drv: Ready (or error stage)
    Drv-->>Lib: Success / Error code
    Lib-->>User: Boot complete

Memory Layout

graph LR
    subgraph "BAR0 - MMIO (16MB)"
        PMC[PMC Registers]
        FALCON[GSP Falcon]
        SEC2[SEC2 Falcon]
        RISCV_CTRL[RISC-V Control]
        GSP_QUEUE[GSP Queues]
    end

    subgraph "BAR1 - VRAM (24GB)"
        USER_MEM[User Memory]
        GSP_HEAP[GSP Heap<br/>129MB]
        WPR2[WPR2 Region<br/>Protected by FWSEC]
        FRTS[FRTS Scratch<br/>1MB]
    end

    subgraph "System RAM (DMA)"
        CMD_Q[Command Queue<br/>256KB]
        STAT_Q[Status Queue<br/>256KB]
        FW_BUF[GSP-RM Firmware<br/>~63MB non-contiguous]
        BOOTER[booter_load<br/>~1MB]
        VBIOS[VBIOS/FWSEC<br/>~4MB]
        RADIX3[Radix3 Page Tables]
    end

    PMC -.->|Control| USER_MEM
    SEC2 -.->|Execute| BOOTER
    FALCON -.->|Load| FW_BUF
    GSP_QUEUE -.->|RPC| GSP_HEAP

Component Interaction

graph TD
    subgraph "NVDAAL.kext Components"
        A[NVDAAL<br/>Main IOService] --> B[NVDAALGsp<br/>GSP Controller]
        A --> C[NVDAALMemory<br/>VRAM Allocator]
        A --> D[NVDAALQueue<br/>Command Queue]
        A --> E[NVDAALDisplay<br/>Fake Display]
        A --> F[NVDAALUserClient<br/>User Interface]

        B --> |"parseElfFirmware()"| B1[ELF Parser]
        B --> |"buildRadix3PageTable()"| B2[Page Tables]
        B --> |"boot()"| B3[Boot Sequence]
        B --> |"sendRpc()"| B4[RPC Protocol]

        C --> |"allocVram()"| C1[Linear Allocator]
        D --> |"push() / kick()"| D1[Ring Buffer]
    end

📝 Roadmap

Phase	Description	Status
1. Foundation	PCI detection, BAR mapping, chip ID	✅ Complete
2. GSP Init	Firmware loading, RPC setup, boot sequence	✅ Complete
3. User API	libNVDAAL, IOUserClient, CLI tool	✅ Complete
4. Enhanced Boot	SEC2/FWSEC/WPR2, booter_load, error diagnostics	✅ Complete
5. Memory	VRAM allocation, DMA buffers, virtual memory	🚧 In Progress
6. Compute	Queue management, command submission, sync	⌛ Planned
7. Integration	tinygrad, PyTorch backends	⌛ Planned

📂 Project Structure

NVDAAL-Driver/
├── Sources/                  # Kernel extension source
│   ├── NVDAAL.{h,cpp}       # Main IOService driver
│   ├── NVDAALGsp.{h,cpp}    # GSP controller & RPC
│   ├── NVDAALUserClient.{h,cpp}  # User-space interface
│   ├── NVDAALMemory.{h,cpp} # VRAM allocator
│   ├── NVDAALQueue.{h,cpp}  # Command queue
│   ├── NVDAALDisplay.{h,cpp}# Fake display engine
│   └── NVDAALRegs.h         # Register definitions
├── Library/                  # User-space SDK
│   ├── libNVDAAL.{h,cpp}    # C++ API wrapper
│   └── nvdaal_c_api.cpp     # C FFI bindings
├── Tools/
│   ├── nvdaal-cli/          # CLI firmware loader
│   ├── extract_vbios.py     # VBIOS extraction
│   └── test_driver.py       # Python test harness
├── Docs/                     # Technical documentation
│   ├── ARCHITECTURE.md      # Component details
│   ├── GSP_INIT.md          # GSP boot guide
│   └── TODO.md              # Development checklist
├── Firmware/                 # User-provided firmware
├── Info.plist               # Kext configuration
├── Makefile                 # Build system
└── README.md

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide before submitting a PR.

Development Commands

make clean           # Clean build artifacts
make                 # Build kext + tools + library
make rebuild         # Clean + build
make test            # Validate kext structure
make load            # Load kext temporarily
make unload          # Unload kext
make logs            # View driver logs (last 5 min)
make logs-live       # Stream logs in real-time
make status          # Check kext and PCI status
make download-firmware  # Download GSP firmware

📚 Resources

TinyGPU/tinygrad - Primary reference for GSP
NVIDIA open-gpu-kernel-modules - Official open-source drivers
Nouveau Project - Linux open-source NVIDIA driver
envytools - NVIDIA GPU documentation

⚠️ Disclaimer

This project is for educational and research purposes only. There is no guarantee of functionality. Use of proprietary firmware may violate NVIDIA's license terms. Use at your own risk.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with 💜 by Gabriel Maia

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
Docs		Docs
EFI/NvdaalFwsec		EFI/NvdaalFwsec
Firmware		Firmware
Library		Library
Sources		Sources
Tests		Tests
Tools		Tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
GEMINI.md		GEMINI.md
Info.plist		Info.plist
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

License

gabrielmaialva33/NVDAAL-Driver

Folders and files

Latest commit

History

Repository files navigation