NVIDIA Kernel Detective (NKD)

NVIDIA Kernel Detective captures NVIDIA GPU pushbuffer traffic from instrumented kernel paths and CUDA user space, decodes the method stream, and exposes it through a live browser UI.

For now, it only works on Blackwell GPUs.

Overview

Structure:

open-gpu-kernel-modules/: for the instrumented NVIDIA open kernel modules
userspace/: for nkd_daemon, nkd_preload.so, and the decoder pipeline
frontend/: for the WebSocket server and web UI
tests/cuda: for CUDA workloads and validation entrypoints

NKD requirements:

CUDA 13.1
NVIDIA open kernel modules 590.44.01 from this repository's open-gpu-kernel-modules submodule

Tested on NVIDIA RTX 5050.

Installation

The flow below installs NVIDIA user-space components from the official runfiles, then builds and loads the kernel modules from this checkout.

export CUDA_RUN=cuda_13.1.0_590.44.01_linux.run
export NV_DRV_VERSION=590.44.01

curl -LO "https://developer.download.nvidia.com/compute/cuda/13.1.0/local_installers/${CUDA_RUN}"
curl -LO "https://us.download.nvidia.com/XFree86/Linux-x86_64/${NV_DRV_VERSION}/NVIDIA-Linux-x86_64-${NV_DRV_VERSION}.run"

# Install the CUDA 13.1 toolkit.
sudo sh "./${CUDA_RUN}"

# Install the matching NVIDIA user-space stack without replacing the
# kernel modules that will be built from this repository.
sudo sh "./NVIDIA-Linux-x86_64-${NV_DRV_VERSION}.run" --extract-only
cd "NVIDIA-Linux-x86_64-${NV_DRV_VERSION}"
sudo ./nvidia-installer --no-kernel-modules
cd ..

git clone https://github.com/meowmeowxw/nvidia-kernel-detective.git
cd nvidia-kernel-detective
git submodule update --init --recursive

make -C userspace
make -C tests/helpers clean all
make -C tests/cuda clean all
make -C open-gpu-kernel-modules modules -j"$(nproc)"

# Load the modules built from this checkout.
sudo rmmod nvidia_uvm nvidia_modeset nvidia 2>/dev/null || true
cd open-gpu-kernel-modules/kernel-open
sudo insmod ./nvidia.ko
sudo insmod ./nvidia-modeset.ko
sudo insmod ./nvidia-uvm.ko
cd ../..

First Capture

Install the frontend dependency and launch the server from the repository root:

cd frontend
npm install
cd ..

sudo node frontend/server.js --port=8080 --daemon="$PWD/userspace/nkd_daemon"

frontend/server.js starts userspace/nkd_daemon --json, resets the relay state, enables capture, and serves the browser UI on port 8080.

Open http://localhost:8080. A healthy startup looks like this:

the status dot turns green
the top-right button changes to Stop
the push counter starts increasing once a workload runs

In a second terminal, run the default explicit-copy CUDA workload:

LD_PRELOAD="$PWD/userspace/nkd_preload.so" ./tests/cuda/vector_add_device

You should see new rows appear in the web view with UVM, RM, and CUDA source badges. Click a push group to inspect decoded operations and the raw pushbuffer hexdump.

Example:

Architecture

flowchart LR
    A[CUDA workload<br/>vector_add_device] --> B[nkd_preload.so<br/>CUDA user capture]
    C[Instrumented NVIDIA open kernel modules<br/>UVM + RM relay] --> D[nkd_daemon<br/>merge + decode]
    B --> D
    D --> E[frontend/server.js<br/>WebSocket + session store]
    E --> F[Browser UI<br/>live web view]

NKD combines two capture paths:

kernel relay records from the instrumented nkd and nkd_rm debugfs paths
CUDA user-channel records emitted by nkd_preload.so via LD_PRELOAD

nkd_daemon merges those streams, attaches semantic decode where possible, and forwards JSON records to the frontend server for live display and retained session history.

Tests And Helper Scripts

Run the offline checks first:

python3 tests/offline/run_decoder_fixtures.py
python3 tests/offline/check_frontend_contract.py
node tests/offline/test_web_session_store.js
bash tests/offline/test_remote_target_config.sh

Remote and live helper scripts live under scripts/. Configure scripts/remote.env from scripts/remote.example.env, or export the NKD_REMOTE_* variables directly, before using them.

cp scripts/remote.example.env scripts/remote.env
# edit scripts/remote.env for your target

./scripts/sync_server.sh
./scripts/build_server_userspace.sh
./scripts/build_server_kernel.sh
./scripts/test_server_live.sh

For local automated live validation, see tests/live/.

Notes

The frontend retains the current session under /tmp/nkd-web-session/ by default.
tests/cuda/vector_add_device is the default explicit-copy CUDA workload used by the live flow and tests.
tests/cuda/vector_add_managed is also available for managed-memory coverage.
tests/cuda/Makefile currently builds the bundled workloads for sm_120 by default, which is also the target of the project.
The server needs elevated privileges because it manages /sys/kernel/debug/nkd* capture controls and launches nkd_daemon.
CUDA commands decoding is WIP.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
frontend		frontend
images		images
open-gpu-kernel-modules @ 968ed3a		open-gpu-kernel-modules @ 968ed3a
scripts		scripts
tests		tests
userspace		userspace
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA Kernel Detective (NKD)

Overview

Installation

First Capture

Architecture

Tests And Helper Scripts

Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Kernel Detective (NKD)

Overview

Installation

First Capture

Architecture

Tests And Helper Scripts

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages