Skip to content

vine: reaccounting disk allocation of tasks in workers #4063

@tphung3

Description

@tphung3

A worker by default reports the disk usage of its cache and its tasks' disk allocations as its total disk usage to the manager. If tasks' inputs are already in the cache however, this results in the duplication of the cached input disk usage in both the vine cache and in the tasks' disk allocations.

For example, a worker W with 30GBs of disk allocation is assigned a task T1 with 20GBs of disk allocation with 19GBs of cacheable input files. To run T1, W fetches and caches 19GBs of T1's cacheable input files in its cache. This causes W to report back to the manager with its total disk usage = its vine cache + its task disk allocation = 19GBs + 20GBs = 39GBs, while the true disk usage value is 19GBs (from the cache) plus whatever files that are in T1's sandbox that are not cached. This issue causes the manager to not send tasks to W even though it can.

To fix this problem, when the manager is matching a task to a worker, it should adjust the task's disk allocation if some of its input files are already cached. Using the example above, T1's disk allocation should be adjusted by the manager from 20GBs to (20-19) = 1GB.

Points of contact: @tphung3 @colinthomas-z80

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions