-
Notifications
You must be signed in to change notification settings - Fork 34
Pull requests: NVIDIA/nvidia-resiliency-ext
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add cycle tracking and REST API for failure attribution
#217
opened Nov 4, 2025 by
hexinw-nvidia
•
Draft
[FR attribution] FR logic update to remove any use of PG description but window-based ordering
#216
opened Nov 4, 2025 by
sbak5
Loading…
[Attribution] Initial MCP integration to make NVRx attr module as a tool of nvidia-resiliency agent
#215
opened Oct 31, 2025 by
sbak5
Loading…
feat: add non-retryable exception pattern matching
#212
opened Oct 28, 2025 by
hexinw-nvidia
Loading…
[FR_attribution] Fix issues with stale entries
ci-approved
Approved to run CI
#210
opened Oct 27, 2025 by
sbak5
Loading…
feat: Add NUMA binding support for optimized memory affinity
ci-approved
Approved to run CI
#209
opened Oct 24, 2025 by
hexinw-nvidia
Loading…
Added GPU memory logger.
ci-approved
Approved to run CI
#206
opened Oct 21, 2025 by
hexinw-nvidia
Loading…
Add example for multimodal models
ci-approved
Approved to run CI
#131
opened Jul 25, 2025 by
Ava-A4098
Loading…
ProTip!
Follow long discussions with comments:>50.