-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Current Implementation
The evaluator currently converts models from TorchTitan to HuggingFace format by:
- Loading the full model into memory
- Converting the loaded model to HF format
- Saving the HF model to disk
This process is memory-intensive and slower than necessary.
Proposed Improvement
We can optimize the conversion process by creating SafeTensors directly from the state dict without instantiating the actual model:
- Extract the state dict from the TorchTitan checkpoint
- Transform tensor names and shapes to match HF format
- Write SafeTensors file directly from the transformed state dict
- Ensure all HF config files (config.json, tokenizer configs) are correctly generated
Benefits
- Significantly reduced memory usage (no need to load full model)
- Faster conversion process
- More efficient disk I/O
Validation Requirements
- Verify that Hellaswag evaluation scores remain identical with the new conversion method
- Ensure all other evaluation metrics (ARC, MMLU, etc.) produce the same results
- Confirm the generated SafeTensors files are fully compatible with HF transformers library
- Test with different model sizes to ensure the approach scales
Implementation Notes
The key is to ensure all metadata and configurations are preserved correctly during the direct state dict transformation, including:
- Proper tensor naming conventions
- Correct dtype preservation
- Model architecture configuration
- Tokenizer settings
Metadata
Metadata
Assignees
Labels
No labels