Skip to content

Proposal / Feedback Request: HPX-Vision (Distributed Observability Prototype for HPX) #7225

@arpittkhandelwal

Description

@arpittkhandelwal

Overview

I’ve been working on an experimental project called HPX-Vision, which explores observability and debugging capabilities for the HPX runtime, particularly for distributed and asynchronous task graphs.

Repository: https://github.com/arpittkhandelwal/Hpx-Vision


Dashboard Preview

HPX-Vision Dashboard


Current Problems

While working with HPX, I encountered several challenges:

  • Limited visibility into task execution flow and dependencies
  • Difficult to debug distributed deadlocks or stalls
  • Lack of insight into scheduler behavior and thread utilization
  • Hard to understand runtime performance bottlenecks at scale
  • Existing tools are often not HPX-aware or require manual instrumentation

Key Ideas Explored

  • Wait-For Graph (WFG)
    Tracks inter-task dependencies and detects deadlocks using Tarjan’s SCC algorithm.

  • Sampling-based Metrics Collection
    Lightweight probabilistic tracking to reduce runtime overhead.

  • Task Metadata & Heartbeats
    Tracks activity of HPX threads and enables cleanup of stale state.

  • Fault-tolerant model
    Lease-based consistency (edges expire, inactive tasks pruned).

  • Data export layer
    JSON export for external visualization/debugging tools.

  • External dashboard (React + D3)
    Visualizes task graphs and runtime behavior.


Motivation

This project explores whether richer observability can help:

  • Understand task dependencies
  • Detect deadlocks
  • Analyze runtime behavior in distributed HPX applications

Clarification

This is not a proposal to merge the full system into HPX.

The intention is to get feedback on whether:

  • A minimal observability/performance hook layer in HPX would be useful
  • Parts of this approach could be adapted in a lightweight form

Questions

  • Would a minimal observability/performance layer in HPX be valuable?
  • Are there existing efforts in this direction?
  • Would it make sense to extract smaller contributions (e.g., task timing, basic metrics)?
  • Any feedback on overhead, complexity, or scope?

Closing

Feedback on whether this direction is useful or aligns with HPX goals would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions