-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Open
Labels
community-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoredashboardIssues specific to the Ray DashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingtriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)usability
Description
Description
This week we encountered that many jobs had failed. The reason the shared drive all jobs were using was full. We had to restore some jobs which caused an extreme increase of disk space needed per job.
In the dashboard or ray status this is not visible as it only tells about the local disk space not the shared mounted one.
It would be great if we could add an additional path like ray start --(additional)-storage-path ... that will be included in the dashboard and ray status.
As a further edgecase the shared disk is shared by multiple factions with user limits, maybe that can be included in the query in the query as well.
Use case
Monitor and prevent out of disk space on a shared drive.
Metadata
Metadata
Assignees
Labels
community-backlogcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray CoredashboardIssues specific to the Ray DashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingtriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)usability