Skip to content

Request: Automatic Worker Switching with Timeout Parameter for Data Loading #119

@Heccc257

Description

@Heccc257

Type : Feature Request / Question

Description :
During training, data loading workers may occasionally hang (e.g., due to temporarily unavailable data files). This blocks the training pipeline until the worker recovers. To improve fault tolerance, I propose adding a timeout parameter to automatically switch to another worker if a worker fails to load data within the specified time.

Example Scenario :

Worker 1 starts loading data but hangs due to I/O issues.
After timeout=30s, the system should terminate Worker 1 and assign the task to Worker 2.
Suggested Implementation :

Add a timeout parameter to the DataLoader configuration.
Use a watchdog mechanism to monitor worker activity and reassign tasks on timeout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions