Skip to content

[Dashboard][Core] Option to monitor drive that is shared by all jobs #58918

@Daraan

Description

@Daraan

Description

This week we encountered that many jobs had failed. The reason the shared drive all jobs were using was full. We had to restore some jobs which caused an extreme increase of disk space needed per job.

In the dashboard or ray status this is not visible as it only tells about the local disk space not the shared mounted one.
It would be great if we could add an additional path like ray start --(additional)-storage-path ... that will be included in the dashboard and ray status.

As a further edgecase the shared disk is shared by multiple factions with user limits, maybe that can be included in the query in the query as well.

Use case

Monitor and prevent out of disk space on a shared drive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    community-backlogcoreIssues that should be addressed in Ray CoredashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capabilityobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingtriageNeeds triage (eg: priority, bug/not-bug, and owning component)usability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions