You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor: Improve block allocation and expert string parsing
This commit refactors the DisTorch safetensor loading and allocation logic for improved performance and correctness.
The main changes to the block assignment are:
- The primary compute device is now included in the pool of "donor" devices, allowing for more holistic memory quota calculation across all available GPUs.
- Unassigned "orphan" blocks are now allocated to the compute device instead of the CPU. This keeps more of the model in VRAM, reducing potential bottlenecks.
Additionally, this commit:
- Fixes a bug in the byte expert string parser where the wildcard `*` was incorrectly checked in the device name instead of the value.
- Standardizes variable names like `allocations_string` for better code clarity and consistency.
ifnotassigned_to_donor:#Note - small rounding errors and tensor-fitting on devices make a block occasionally an orphan. We treat orphans the same as tiny_block_list as they are generally small rounding errors
0 commit comments