NVIDIA Kernel Detective captures NVIDIA GPU pushbuffer traffic from instrumented kernel paths and CUDA user space, decodes the method stream, and exposes it through a live browser UI.
For now, it only works on Blackwell GPUs.
Structure:
open-gpu-kernel-modules/: for the instrumented NVIDIA open kernel modulesuserspace/: fornkd_daemon,nkd_preload.so, and the decoder pipelinefrontend/: for the WebSocket server and web UItests/cuda: for CUDA workloads and validation entrypoints
NKD requirements:
- CUDA
13.1 - NVIDIA open kernel modules
590.44.01from this repository'sopen-gpu-kernel-modulessubmodule
Tested on NVIDIA RTX 5050.
The flow below installs NVIDIA user-space components from the official runfiles, then builds and loads the kernel modules from this checkout.
export CUDA_RUN=cuda_13.1.0_590.44.01_linux.run
export NV_DRV_VERSION=590.44.01
curl -LO "https://developer.download.nvidia.com/compute/cuda/13.1.0/local_installers/${CUDA_RUN}"
curl -LO "https://us.download.nvidia.com/XFree86/Linux-x86_64/${NV_DRV_VERSION}/NVIDIA-Linux-x86_64-${NV_DRV_VERSION}.run"
# Install the CUDA 13.1 toolkit.
sudo sh "./${CUDA_RUN}"
# Install the matching NVIDIA user-space stack without replacing the
# kernel modules that will be built from this repository.
sudo sh "./NVIDIA-Linux-x86_64-${NV_DRV_VERSION}.run" --extract-only
cd "NVIDIA-Linux-x86_64-${NV_DRV_VERSION}"
sudo ./nvidia-installer --no-kernel-modules
cd ..
git clone https://github.com/meowmeowxw/nvidia-kernel-detective.git
cd nvidia-kernel-detective
git submodule update --init --recursive
make -C userspace
make -C tests/helpers clean all
make -C tests/cuda clean all
make -C open-gpu-kernel-modules modules -j"$(nproc)"
# Load the modules built from this checkout.
sudo rmmod nvidia_uvm nvidia_modeset nvidia 2>/dev/null || true
cd open-gpu-kernel-modules/kernel-open
sudo insmod ./nvidia.ko
sudo insmod ./nvidia-modeset.ko
sudo insmod ./nvidia-uvm.ko
cd ../..Install the frontend dependency and launch the server from the repository root:
cd frontend
npm install
cd ..
sudo node frontend/server.js --port=8080 --daemon="$PWD/userspace/nkd_daemon"frontend/server.js starts userspace/nkd_daemon --json, resets the relay state, enables capture, and serves the browser UI on port 8080.
Open http://localhost:8080. A healthy startup looks like this:
- the status dot turns green
- the top-right button changes to
Stop - the push counter starts increasing once a workload runs
In a second terminal, run the default explicit-copy CUDA workload:
LD_PRELOAD="$PWD/userspace/nkd_preload.so" ./tests/cuda/vector_add_deviceYou should see new rows appear in the web view with UVM, RM, and CUDA source badges. Click a push group to inspect decoded operations and the raw pushbuffer hexdump.
Example:
flowchart LR
A[CUDA workload<br/>vector_add_device] --> B[nkd_preload.so<br/>CUDA user capture]
C[Instrumented NVIDIA open kernel modules<br/>UVM + RM relay] --> D[nkd_daemon<br/>merge + decode]
B --> D
D --> E[frontend/server.js<br/>WebSocket + session store]
E --> F[Browser UI<br/>live web view]
NKD combines two capture paths:
- kernel relay records from the instrumented
nkdandnkd_rmdebugfs paths - CUDA user-channel records emitted by
nkd_preload.soviaLD_PRELOAD
nkd_daemon merges those streams, attaches semantic decode where possible, and forwards JSON records to the frontend server for live display and retained session history.
Run the offline checks first:
python3 tests/offline/run_decoder_fixtures.py
python3 tests/offline/check_frontend_contract.py
node tests/offline/test_web_session_store.js
bash tests/offline/test_remote_target_config.shRemote and live helper scripts live under scripts/. Configure scripts/remote.env from scripts/remote.example.env, or export the NKD_REMOTE_* variables directly, before using them.
cp scripts/remote.example.env scripts/remote.env
# edit scripts/remote.env for your target
./scripts/sync_server.sh
./scripts/build_server_userspace.sh
./scripts/build_server_kernel.sh
./scripts/test_server_live.shFor local automated live validation, see tests/live/.
- The frontend retains the current session under
/tmp/nkd-web-session/by default. tests/cuda/vector_add_deviceis the default explicit-copy CUDA workload used by the live flow and tests.tests/cuda/vector_add_managedis also available for managed-memory coverage.tests/cuda/Makefilecurrently builds the bundled workloads forsm_120by default, which is also the target of the project.- The server needs elevated privileges because it manages
/sys/kernel/debug/nkd*capture controls and launchesnkd_daemon. - CUDA commands decoding is WIP.

