Skip to content

Open-source compute driver for RTX 40 series GPUs on macOS - Pure AI/ML power

License

Notifications You must be signed in to change notification settings

gabrielmaialva33/NVDAAL-Driver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


NVDAAL
NVDAAL - NVIDIA Ada Lovelace Compute Driver

Open-source compute driver for RTX 40 series GPUs on macOS - Pure AI/ML power

CI License Language Platform GPU Release Last Commit


AboutΒ Β Β |Β Β Β  Quick StartΒ Β Β |Β Β Β  FeaturesΒ Β Β |Β Β Β  ArchitectureΒ Β Β |Β Β Β  RoadmapΒ Β Β |Β Β Β  ContributingΒ Β Β |Β Β Β  License


πŸ”– About

NVDAAL (NVIDIA Ada Lovelace) is an open-source compute-only driver for NVIDIA RTX 40 series GPUs on macOS Hackintosh. This driver focuses exclusively on AI/ML workloads, leveraging the full compute power of Ada Lovelace architecture without display overhead.

⚠️ Important Notice

This project is experimental and in early development. It requires:

  • Deep understanding of GPU architecture
  • Hackintosh environment with proper configuration
  • GSP (GPU System Processor) firmware for full functionality

⚑ Why Compute-Only?

Aspect Benefit
Simplicity No framebuffer, display engine, or video output code
Focus 100% of GPU power dedicated to compute workloads
Viability Based on proven TinyGPU implementation
Performance Direct access to CUDA cores and Tensor cores

πŸ–₯️ Supported Hardware

GPU Device ID CUDA Cores Tensor Cores Status
RTX 4090 0x2684 16,384 512 🚧 Development
RTX 4090 D 0x2685 14,592 456 βŒ› Planned
RTX 4080 Super 0x2702 10,240 320 βŒ› Planned
RTX 4080 0x2704 9,728 304 βŒ› Planned
RTX 4070 Ti Super 0x2705 8,448 264 βŒ› Planned
RTX 4070 Ti 0x2782 7,680 240 βŒ› Planned
RTX 4070 Super 0x2860 7,168 224 βŒ› Planned
RTX 4070 0x2786 5,888 184 βŒ› Planned

πŸš€ Quick Start

βœ”οΈ Prerequisites

  • macOS Tahoe 26+ (via OpenCore 1.0.7+)
  • Xcode Command Line Tools
  • NVIDIA RTX 40 series GPU
  • Boot args: kext-dev-mode=1 or amfi_get_out_of_my_way=0x1

⬇️ Installation

Option 1: Download Pre-built Release

# Download latest release from GitHub Releases
curl -LO https://github.com/gabrielmaialva33/NVDAAL-Driver/releases/latest/download/NVDAAL-Release-x86_64.zip

# Extract
unzip NVDAAL-Release-x86_64.zip

# Install kext
sudo cp -R NVDAAL.kext /Library/Extensions/
sudo kextutil /Library/Extensions/NVDAAL.kext

Option 2: Build from Source

# Clone the repository
git clone https://github.com/gabrielmaialva33/NVDAAL-Driver.git
cd NVDAAL-Driver

# Download GSP firmware
make download-firmware

# Build the kext + tools + library
make clean && make

# Validate structure
make test

# Load temporarily (for testing)
make load

# Check logs
make logs

⚑ Boot Sequence

# Full boot with all firmwares (recommended)
nvdaal-cli boot Firmware/

# Legacy single-file load
nvdaal-cli load Firmware/gsp-570.144.bin

The boot command expects these files in the firmware directory:

File Required Purpose
gsp-570.144.bin (or gsp.bin) Yes GSP-RM firmware
booter_load-ad102-570.144.bin No SEC2 booter (Heavy-Secure)
AD102.rom No VBIOS for FWSEC-FRTS

πŸ“¦ Permanent Installation

# Install to /Library/Extensions
make install

# Reboot required
sudo reboot

πŸ”§ Features

Current (v0.6.1-dev - RSA Signature Patching & WPR2 Configuration)

  • βœ… PCI device detection and enumeration
  • βœ… BAR0/BAR1 memory mapping (MMIO + VRAM)
  • βœ… Chip identification (Ada Lovelace architecture)
  • βœ… GSP Controller Implementation
    • ELF Firmware Parser (non-contiguous 63MB support)
    • Radix3 Page Table Builder (per-page physical addressing)
    • WPR2 Metadata Configuration
    • Complete VBIOS Parsing:
      • BAR0 VBIOS reading (direct from GPU at 0x300000)
      • ROM image scanning (0x55AA signatures)
      • PCIR header parsing & FWSEC image detection (type 0xE0)
      • BIT (BIOS Information Table) header scanning
      • Ada Lovelace Token 0x50 PMU table path (with Token 0x70 fallback)
      • PMU Lookup Table & Falcon Ucode Descriptor extraction
      • FalconUcodeDescV3Nvidia parsing (pkcDataOffset, signatureCount, signatureVersions)
    • Real FWSEC-FRTS Execution (matching NVIDIA open-gpu-kernel-modules):
      • Falcon IMEM/DMEM ucode loading
      • Fuse version reading (readUcodeFuseVersion())
      • RSA-3K signature patching (patchFwsecSignature())
      • FRTS command buffer patching (patchFrtsCmdBuffer())
      • DMEMMAPPER interface patching (FRTS command 0x15)
      • GSP Falcon boot with timeout monitoring
    • Enhanced Boot Sequence:
      • SEC2 FALCON reset
      • FWSEC-FRTS execution (WPR2 setup)
      • booter_load on SEC2 (HS mode)
      • RISC-V core start (correct 0x118000 base for Ada)
    • Detailed error stages (bootEx())
    • Debug Mode: Continues boot even on FWSEC/booter failures
    • Register Scanning: Auto-detect RISC-V base address
  • βœ… Full RPC Engine (rmAlloc, rmControl)
  • βœ… Interrupt Driven Architecture
    • MSI (Message Signaled Interrupts) support
    • Reactive status queue processing
  • βœ… Memory Management (MMU)
    • Virtual Address Space (VASpace)
    • Page Directory/Table management
  • βœ… Compute Engine
    • GPFIFO Channel creation
    • User Doorbell mapping
    • Command Submission
  • βœ… User-Space Interface
    • IOUserClient for secure firmware upload
    • Zero-copy memory mapping
    • libNVDAAL shared library
    • Detailed error codes from kernel
  • βœ… CLI Tool (nvdaal-cli)
    • boot command for full sequence
    • fwsec command for WPR2 configuration
    • status command for GPU register status
    • load command for legacy loading
  • βœ… Multi-Architecture Build
    • arm64 (Apple Silicon)
    • x86_64 (Intel)

In Development

  • 🚧 WPR2 Configuration (FWSEC-FRTS with proper signature patching)
  • 🚧 Compute Class (ADA_COMPUTE_A) Context
  • 🚧 Semaphore Synchronization

Planned

  • βŒ› tinygrad/PyTorch integration
  • βŒ› CUDA-like compute API

⭐ Pioneer Insights

As of v0.6.1, NVDAAL is one of the first open-source efforts to bring Ada Lovelace compute to macOS. Key architectural decisions made for excellence:

  • Lock-Free GSP RPC: Using synchronous memory barriers and stack-allocated buffers to minimize kernel latency during GPU resource management.
  • Hardware-Native GPFIFO: Fully compliant with the 128-bit entry format required by AD10x chips, enabling direct hardware work submission.
  • Dynamic MMU: Implements a real-time Bump Allocator for GPU Virtual Address Space, ensuring memory isolation and proper page alignment for Tensor core workloads.
  • Complete Boot Pipeline: Full SEC2 + FWSEC + GSP-RM boot sequence matching NVIDIA's reference implementation, with detailed error stage reporting for debugging.
  • Native VBIOS Parsing: Complete VBIOS ROM parsing including PCIR headers, BIT tables, PMU lookup, and Falcon ucode extraction for real FWSEC-FRTS execution.
  • Non-Contiguous Memory: Handles 63MB GSP-RM firmware without requiring physically contiguous allocation, using per-page Radix3 table entries.

πŸ“ˆ Performance Status

Component Status Optimization
RPC Latency πŸ”… Low Stack-based buffers
Memory Alloc πŸ”† High Bump Allocator (Linear)
Submission πŸ”† High Direct Doorbell (UserD)
Boot Diagnostics πŸ”† High Error stage codes

βš™οΈ Architecture

System Overview

graph TB
    subgraph "User Space"
        CLI[nvdaal-cli]
        PY[Python Scripts]
        ML[tinygrad / PyTorch]
        LIB[libNVDAAL.dylib]
    end

    subgraph "Kernel Space (NVDAAL.kext)"
        UC[NVDAALUserClient]
        DEV[NVDAAL IOService]
        MEM[NVDAALMemory]
        QUEUE[NVDAALQueue]
        GSP[NVDAALGsp]
        DISP[NVDAALDisplay]
    end

    subgraph "Hardware (RTX 4090)"
        RISCV[GSP RISC-V Core]
        SM[128 SMs / 16384 CUDA Cores]
        TENSOR[512 Tensor Cores]
        VRAM[24GB GDDR6X]
    end

    CLI --> LIB
    PY --> LIB
    ML --> LIB
    LIB --> UC
    UC --> DEV
    DEV --> MEM
    DEV --> QUEUE
    DEV --> GSP
    DEV --> DISP
    GSP -->|RPC Protocol| RISCV
    QUEUE -->|Compute Commands| SM
    SM --> TENSOR
    MEM -->|BAR1 Mapping| VRAM
Loading

GSP Boot Sequence

sequenceDiagram
    participant User as nvdaal-cli
    participant Lib as libNVDAAL
    participant Drv as NVDAAL.kext
    participant GSP as NVDAALGsp
    participant SEC2 as SEC2 Falcon
    participant HW as GSP RISC-V

    User->>Lib: boot(firmware_dir)
    Note over Lib: Load VBIOS, booter_load, GSP-RM

    Lib->>Drv: loadVbios(AD102.rom)
    Lib->>Drv: loadBooterLoad(booter_load.bin)
    Lib->>Drv: loadFirmware(gsp.bin)

    Drv->>GSP: Initialize GSP
    GSP->>GSP: Parse ELF (63MB, non-contiguous)
    GSP->>GSP: Build Radix3 page tables

    GSP->>HW: Reset GSP Falcon
    GSP->>SEC2: Reset SEC2 Falcon

    alt VBIOS loaded
        GSP->>SEC2: Execute FWSEC-FRTS
        SEC2-->>GSP: WPR2 region configured
    else No VBIOS
        GSP->>GSP: Check WPR2 (EFI may have set it)
    end

    GSP->>GSP: Setup WPR metadata

    alt booter_load available
        GSP->>SEC2: Execute booter_load (HS mode)
        SEC2->>SEC2: Authenticate GSP-RM
        SEC2-->>GSP: Boot handoff ready
    end

    GSP->>HW: Start RISC-V core
    HW-->>GSP: GSP_INIT_DONE event
    GSP->>GSP: Setup RPC queues
    GSP-->>Drv: Ready (or error stage)
    Drv-->>Lib: Success / Error code
    Lib-->>User: Boot complete
Loading

Memory Layout

graph LR
    subgraph "BAR0 - MMIO (16MB)"
        PMC[PMC Registers]
        FALCON[GSP Falcon]
        SEC2[SEC2 Falcon]
        RISCV_CTRL[RISC-V Control]
        GSP_QUEUE[GSP Queues]
    end

    subgraph "BAR1 - VRAM (24GB)"
        USER_MEM[User Memory]
        GSP_HEAP[GSP Heap<br/>129MB]
        WPR2[WPR2 Region<br/>Protected by FWSEC]
        FRTS[FRTS Scratch<br/>1MB]
    end

    subgraph "System RAM (DMA)"
        CMD_Q[Command Queue<br/>256KB]
        STAT_Q[Status Queue<br/>256KB]
        FW_BUF[GSP-RM Firmware<br/>~63MB non-contiguous]
        BOOTER[booter_load<br/>~1MB]
        VBIOS[VBIOS/FWSEC<br/>~4MB]
        RADIX3[Radix3 Page Tables]
    end

    PMC -.->|Control| USER_MEM
    SEC2 -.->|Execute| BOOTER
    FALCON -.->|Load| FW_BUF
    GSP_QUEUE -.->|RPC| GSP_HEAP
Loading

Component Interaction

graph TD
    subgraph "NVDAAL.kext Components"
        A[NVDAAL<br/>Main IOService] --> B[NVDAALGsp<br/>GSP Controller]
        A --> C[NVDAALMemory<br/>VRAM Allocator]
        A --> D[NVDAALQueue<br/>Command Queue]
        A --> E[NVDAALDisplay<br/>Fake Display]
        A --> F[NVDAALUserClient<br/>User Interface]

        B --> |"parseElfFirmware()"| B1[ELF Parser]
        B --> |"buildRadix3PageTable()"| B2[Page Tables]
        B --> |"boot()"| B3[Boot Sequence]
        B --> |"sendRpc()"| B4[RPC Protocol]

        C --> |"allocVram()"| C1[Linear Allocator]
        D --> |"push() / kick()"| D1[Ring Buffer]
    end
Loading

πŸ“ Roadmap

Phase Description Status
1. Foundation PCI detection, BAR mapping, chip ID βœ… Complete
2. GSP Init Firmware loading, RPC setup, boot sequence βœ… Complete
3. User API libNVDAAL, IOUserClient, CLI tool βœ… Complete
4. Enhanced Boot SEC2/FWSEC/WPR2, booter_load, error diagnostics βœ… Complete
5. Memory VRAM allocation, DMA buffers, virtual memory 🚧 In Progress
6. Compute Queue management, command submission, sync βŒ› Planned
7. Integration tinygrad, PyTorch backends βŒ› Planned

πŸ“‚ Project Structure

NVDAAL-Driver/
β”œβ”€β”€ Sources/                  # Kernel extension source
β”‚   β”œβ”€β”€ NVDAAL.{h,cpp}       # Main IOService driver
β”‚   β”œβ”€β”€ NVDAALGsp.{h,cpp}    # GSP controller & RPC
β”‚   β”œβ”€β”€ NVDAALUserClient.{h,cpp}  # User-space interface
β”‚   β”œβ”€β”€ NVDAALMemory.{h,cpp} # VRAM allocator
β”‚   β”œβ”€β”€ NVDAALQueue.{h,cpp}  # Command queue
β”‚   β”œβ”€β”€ NVDAALDisplay.{h,cpp}# Fake display engine
β”‚   └── NVDAALRegs.h         # Register definitions
β”œβ”€β”€ Library/                  # User-space SDK
β”‚   β”œβ”€β”€ libNVDAAL.{h,cpp}    # C++ API wrapper
β”‚   └── nvdaal_c_api.cpp     # C FFI bindings
β”œβ”€β”€ Tools/
β”‚   β”œβ”€β”€ nvdaal-cli/          # CLI firmware loader
β”‚   β”œβ”€β”€ extract_vbios.py     # VBIOS extraction
β”‚   └── test_driver.py       # Python test harness
β”œβ”€β”€ Docs/                     # Technical documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md      # Component details
β”‚   β”œβ”€β”€ GSP_INIT.md          # GSP boot guide
β”‚   └── TODO.md              # Development checklist
β”œβ”€β”€ Firmware/                 # User-provided firmware
β”œβ”€β”€ Info.plist               # Kext configuration
β”œβ”€β”€ Makefile                 # Build system
└── README.md

🀝 Contributing

Contributions are welcome! Please read our Contributing Guide before submitting a PR.

Development Commands

make clean           # Clean build artifacts
make                 # Build kext + tools + library
make rebuild         # Clean + build
make test            # Validate kext structure
make load            # Load kext temporarily
make unload          # Unload kext
make logs            # View driver logs (last 5 min)
make logs-live       # Stream logs in real-time
make status          # Check kext and PCI status
make download-firmware  # Download GSP firmware

πŸ“š Resources

⚠️ Disclaimer

This project is for educational and research purposes only. There is no guarantee of functionality. Use of proprietary firmware may violate NVIDIA's license terms. Use at your own risk.

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with πŸ’œ by Gabriel Maia

About

Open-source compute driver for RTX 40 series GPUs on macOS - Pure AI/ML power

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •