Overview
I’ve been working on an experimental project called HPX-Vision, which explores observability and debugging capabilities for the HPX runtime, particularly for distributed and asynchronous task graphs.
Repository: https://github.com/arpittkhandelwal/Hpx-Vision
Dashboard Preview
Current Problems
While working with HPX, I encountered several challenges:
- Limited visibility into task execution flow and dependencies
- Difficult to debug distributed deadlocks or stalls
- Lack of insight into scheduler behavior and thread utilization
- Hard to understand runtime performance bottlenecks at scale
- Existing tools are often not HPX-aware or require manual instrumentation
Key Ideas Explored
-
Wait-For Graph (WFG)
Tracks inter-task dependencies and detects deadlocks using Tarjan’s SCC algorithm.
-
Sampling-based Metrics Collection
Lightweight probabilistic tracking to reduce runtime overhead.
-
Task Metadata & Heartbeats
Tracks activity of HPX threads and enables cleanup of stale state.
-
Fault-tolerant model
Lease-based consistency (edges expire, inactive tasks pruned).
-
Data export layer
JSON export for external visualization/debugging tools.
-
External dashboard (React + D3)
Visualizes task graphs and runtime behavior.
Motivation
This project explores whether richer observability can help:
- Understand task dependencies
- Detect deadlocks
- Analyze runtime behavior in distributed HPX applications
Clarification
This is not a proposal to merge the full system into HPX.
The intention is to get feedback on whether:
- A minimal observability/performance hook layer in HPX would be useful
- Parts of this approach could be adapted in a lightweight form
Questions
- Would a minimal observability/performance layer in HPX be valuable?
- Are there existing efforts in this direction?
- Would it make sense to extract smaller contributions (e.g., task timing, basic metrics)?
- Any feedback on overhead, complexity, or scope?
Closing
Feedback on whether this direction is useful or aligns with HPX goals would be appreciated.
Overview
I’ve been working on an experimental project called HPX-Vision, which explores observability and debugging capabilities for the HPX runtime, particularly for distributed and asynchronous task graphs.
Repository: https://github.com/arpittkhandelwal/Hpx-Vision
Dashboard Preview
Current Problems
While working with HPX, I encountered several challenges:
Key Ideas Explored
Wait-For Graph (WFG)
Tracks inter-task dependencies and detects deadlocks using Tarjan’s SCC algorithm.
Sampling-based Metrics Collection
Lightweight probabilistic tracking to reduce runtime overhead.
Task Metadata & Heartbeats
Tracks activity of HPX threads and enables cleanup of stale state.
Fault-tolerant model
Lease-based consistency (edges expire, inactive tasks pruned).
Data export layer
JSON export for external visualization/debugging tools.
External dashboard (React + D3)
Visualizes task graphs and runtime behavior.
Motivation
This project explores whether richer observability can help:
Clarification
This is not a proposal to merge the full system into HPX.
The intention is to get feedback on whether:
Questions
Closing
Feedback on whether this direction is useful or aligns with HPX goals would be appreciated.