Name	Name	Last commit message	Last commit date
parent directory ..
local-custom-remediation-demo	local-custom-remediation-demo
local-fault-injection-demo	local-fault-injection-demo
local-slinky-drain-demo	local-slinky-drain-demo
README.md	README.md

Name

Last commit message

Last commit date

NVSentinel Demos

Interactive demonstrations of NVSentinel's core capabilities.

Demo Videos

End-to-End Fault Detection & Remediation Full pipeline: health monitoring, fault detection, quarantine, drain, and breakfix	Custom Health Monitors Building your own GPU health monitor using the gRPC interface
Custom Drain Plugins Slinky integration for coordinated drain of HPC workloads	Extensible Remediation Bringing your own breakfix system or remediation operator
Health Events Analyzer Identifying and removing bad GPU nodes from the cluster

Interactive Demos

Run these locally on your laptop — no GPU hardware needed.

Local Fault Injection Demo

What it shows: GPU failure detection and automated node quarantine

Requirements: Docker, kubectl, kind, helm - no GPU hardware needed

Time: 5-10 minutes

Best for: Understanding how NVSentinel detects hardware failures and automatically protects your cluster by cordoning faulty nodes.

Local Slinky Drain Demo

What it shows: Custom drain extensibility using the Slinky Drainer plugin with scheduler integration

Requirements: Docker, kubectl, kind, helm, ko, go 1.25+ - no GPU hardware needed

Time: 5-10 minutes

Best for: Understanding how NVSentinel's node-drainer can delegate pod eviction to external controllers for custom drain workflows coordinated with HPC schedulers.

Local Custom Remediation Demo

What it shows: Custom remediation action extensibility with a real memory pressure health monitor and third-party remediation controller

Requirements: Docker, kubectl, kind, helm, ko, go 1.25+ - no GPU hardware needed

Time: 5-10 minutes

Best for: Understanding how to extend NVSentinel beyond GPU faults to handle any hardware or system fault — custom health monitors, custom remediation actions, and third-party controllers.

Coming Soon

Pod rescheduling and restarting from checkpointing

Questions? See the main README or open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

NVSentinel Demos

Demo Videos

Interactive Demos

Local Fault Injection Demo

Local Slinky Drain Demo

Local Custom Remediation Demo

Coming Soon

FilesExpand file tree

demos

Directory actions

More options

Directory actions

More options

Latest commit

History

demos

Folders and files

parent directory

README.md

NVSentinel Demos

Demo Videos

Interactive Demos

Local Fault Injection Demo

Local Slinky Drain Demo

Local Custom Remediation Demo

Coming Soon