Make conversion to hf model more efficient

## Current Implementation

  The evaluator currently converts models from TorchTitan to HuggingFace format by:
  1. Loading the full model into memory
  2. Converting the loaded model to HF format
  3. Saving the HF model to disk

  This process is memory-intensive and slower than necessary.

## Proposed Improvement

  We can optimize the conversion process by creating SafeTensors directly from the state dict without instantiating the actual model:

  - Extract the state dict from the TorchTitan checkpoint
  - Transform tensor names and shapes to match HF format
  - Write SafeTensors file directly from the transformed state dict
  - Ensure all HF config files (config.json, tokenizer configs) are correctly generated

## Benefits

  - Significantly reduced memory usage (no need to load full model)
  - Faster conversion process
  - More efficient disk I/O

## Validation Requirements

  - Verify that Hellaswag evaluation scores remain identical with the new conversion method
  - Ensure all other evaluation metrics (ARC, MMLU, etc.) produce the same results
  - Confirm the generated SafeTensors files are fully compatible with HF transformers library
  - Test with different model sizes to ensure the approach scales

## Implementation Notes

The key is to ensure all metadata and configurations are preserved correctly during the direct state dict transformation, including:
  - Proper tensor naming conventions
  - Correct dtype preservation
  - Model architecture configuration
  - Tokenizer settings


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make conversion to hf model more efficient #566

Current Implementation

Proposed Improvement

Benefits

Validation Requirements

Implementation Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make conversion to hf model more efficient #566

Description

Current Implementation

Proposed Improvement

Benefits

Validation Requirements

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions