Skip to content

Make conversion to hf model more efficient #566

@joellidin

Description

@joellidin

Current Implementation

The evaluator currently converts models from TorchTitan to HuggingFace format by:

  1. Loading the full model into memory
  2. Converting the loaded model to HF format
  3. Saving the HF model to disk

This process is memory-intensive and slower than necessary.

Proposed Improvement

We can optimize the conversion process by creating SafeTensors directly from the state dict without instantiating the actual model:

  • Extract the state dict from the TorchTitan checkpoint
  • Transform tensor names and shapes to match HF format
  • Write SafeTensors file directly from the transformed state dict
  • Ensure all HF config files (config.json, tokenizer configs) are correctly generated

Benefits

  • Significantly reduced memory usage (no need to load full model)
  • Faster conversion process
  • More efficient disk I/O

Validation Requirements

  • Verify that Hellaswag evaluation scores remain identical with the new conversion method
  • Ensure all other evaluation metrics (ARC, MMLU, etc.) produce the same results
  • Confirm the generated SafeTensors files are fully compatible with HF transformers library
  • Test with different model sizes to ensure the approach scales

Implementation Notes

The key is to ensure all metadata and configurations are preserved correctly during the direct state dict transformation, including:

  • Proper tensor naming conventions
  • Correct dtype preservation
  • Model architecture configuration
  • Tokenizer settings

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions