Skip to content

meowmeowxw/nvidia-kernel-detective

Repository files navigation

NKD logo NVIDIA Kernel Detective (NKD)

NVIDIA Kernel Detective captures NVIDIA GPU pushbuffer traffic from instrumented kernel paths and CUDA user space, decodes the method stream, and exposes it through a live browser UI.

For now, it only works on Blackwell GPUs.

Overview

Structure:

  • open-gpu-kernel-modules/: for the instrumented NVIDIA open kernel modules
  • userspace/: for nkd_daemon, nkd_preload.so, and the decoder pipeline
  • frontend/: for the WebSocket server and web UI
  • tests/cuda: for CUDA workloads and validation entrypoints

NKD requirements:

  • CUDA 13.1
  • NVIDIA open kernel modules 590.44.01 from this repository's open-gpu-kernel-modules submodule

Tested on NVIDIA RTX 5050.

Installation

The flow below installs NVIDIA user-space components from the official runfiles, then builds and loads the kernel modules from this checkout.

export CUDA_RUN=cuda_13.1.0_590.44.01_linux.run
export NV_DRV_VERSION=590.44.01

curl -LO "https://developer.download.nvidia.com/compute/cuda/13.1.0/local_installers/${CUDA_RUN}"
curl -LO "https://us.download.nvidia.com/XFree86/Linux-x86_64/${NV_DRV_VERSION}/NVIDIA-Linux-x86_64-${NV_DRV_VERSION}.run"

# Install the CUDA 13.1 toolkit.
sudo sh "./${CUDA_RUN}"

# Install the matching NVIDIA user-space stack without replacing the
# kernel modules that will be built from this repository.
sudo sh "./NVIDIA-Linux-x86_64-${NV_DRV_VERSION}.run" --extract-only
cd "NVIDIA-Linux-x86_64-${NV_DRV_VERSION}"
sudo ./nvidia-installer --no-kernel-modules
cd ..

git clone https://github.com/meowmeowxw/nvidia-kernel-detective.git
cd nvidia-kernel-detective
git submodule update --init --recursive

make -C userspace
make -C tests/helpers clean all
make -C tests/cuda clean all
make -C open-gpu-kernel-modules modules -j"$(nproc)"

# Load the modules built from this checkout.
sudo rmmod nvidia_uvm nvidia_modeset nvidia 2>/dev/null || true
cd open-gpu-kernel-modules/kernel-open
sudo insmod ./nvidia.ko
sudo insmod ./nvidia-modeset.ko
sudo insmod ./nvidia-uvm.ko
cd ../..

First Capture

Install the frontend dependency and launch the server from the repository root:

cd frontend
npm install
cd ..

sudo node frontend/server.js --port=8080 --daemon="$PWD/userspace/nkd_daemon"

frontend/server.js starts userspace/nkd_daemon --json, resets the relay state, enables capture, and serves the browser UI on port 8080.

Open http://localhost:8080. A healthy startup looks like this:

  • the status dot turns green
  • the top-right button changes to Stop
  • the push counter starts increasing once a workload runs

In a second terminal, run the default explicit-copy CUDA workload:

LD_PRELOAD="$PWD/userspace/nkd_preload.so" ./tests/cuda/vector_add_device

You should see new rows appear in the web view with UVM, RM, and CUDA source badges. Click a push group to inspect decoded operations and the raw pushbuffer hexdump.

Example:

Architecture

flowchart LR
    A[CUDA workload<br/>vector_add_device] --> B[nkd_preload.so<br/>CUDA user capture]
    C[Instrumented NVIDIA open kernel modules<br/>UVM + RM relay] --> D[nkd_daemon<br/>merge + decode]
    B --> D
    D --> E[frontend/server.js<br/>WebSocket + session store]
    E --> F[Browser UI<br/>live web view]
Loading

NKD combines two capture paths:

  • kernel relay records from the instrumented nkd and nkd_rm debugfs paths
  • CUDA user-channel records emitted by nkd_preload.so via LD_PRELOAD

nkd_daemon merges those streams, attaches semantic decode where possible, and forwards JSON records to the frontend server for live display and retained session history.

Tests And Helper Scripts

Run the offline checks first:

python3 tests/offline/run_decoder_fixtures.py
python3 tests/offline/check_frontend_contract.py
node tests/offline/test_web_session_store.js
bash tests/offline/test_remote_target_config.sh

Remote and live helper scripts live under scripts/. Configure scripts/remote.env from scripts/remote.example.env, or export the NKD_REMOTE_* variables directly, before using them.

cp scripts/remote.example.env scripts/remote.env
# edit scripts/remote.env for your target

./scripts/sync_server.sh
./scripts/build_server_userspace.sh
./scripts/build_server_kernel.sh
./scripts/test_server_live.sh

For local automated live validation, see tests/live/.

Notes

  • The frontend retains the current session under /tmp/nkd-web-session/ by default.
  • tests/cuda/vector_add_device is the default explicit-copy CUDA workload used by the live flow and tests.
  • tests/cuda/vector_add_managed is also available for managed-memory coverage.
  • tests/cuda/Makefile currently builds the bundled workloads for sm_120 by default, which is also the target of the project.
  • The server needs elevated privileges because it manages /sys/kernel/debug/nkd* capture controls and launches nkd_daemon.
  • CUDA commands decoding is WIP.