Describe the Issue
This issue addresses two critical defects in the model initialization logic within example_trainer/model.py:
- Meta Tensor Traversal Bug: The
get_parent_and_name helper was incorrectly using getattr for numeric parts of a tensor name (e.g., layers.0.weight). Since ModuleLists do not support string-named attributes for indices, this caused model initialization to crash for every transformer architecture.
- RoPE Theta Desync: The framework hardcoded the RoPE base frequency to 10,000. Modern models like Llama 3 (500k) and Qwen (1M+) require significantly higher base frequencies. Using the wrong theta causes a complete breakdown of positional understanding at long contexts.
Environment/API Details
- Environment Class/Name:
example_trainer/model.py
- Environment Configuration: Any configuration using Llama 3, Qwen, or other high-context models.
- API Endpoint/Method Involved:
_initialize_meta_tensors and get_parent_and_name
Steps to Reproduce
- Initialize a model with a
ModuleList structure (e.g., Llama).
- Observe
AttributeError in get_parent_and_name when it tries to access numeric indices.
- If manually bypassed, observe "stuttering" or gibberish output on contexts > 2048 due to theta mismatch.
Interaction Details (if applicable)
- Input
Item to collect_trajectory:
# Example of a broken path that fails:
full_name = "model.layers.0.self_attn.q_proj.weight"
# getattr(model.layers, "0") fails
- Expected Behavior:
get_parent_and_name should detect numeric indices and use __getitem__.
- RoPE theta should be dynamically detected from the model's
config.json.
Setup Details
- OS: Linux
- Python Version: 3.10+
- Atropos Version: commit c20c852
- Relevant Libraries/Versions:
torch, transformers
Additional Context & Logs
This fix implements a multi-source RoPE theta detection logic that scans rope_theta, rope_base, and rope_scaling dictionaries in the model configuration.
Describe the Issue
This issue addresses two critical defects in the model initialization logic within
example_trainer/model.py:get_parent_and_namehelper was incorrectly usinggetattrfor numeric parts of a tensor name (e.g.,layers.0.weight). SinceModuleListsdo not support string-named attributes for indices, this caused model initialization to crash for every transformer architecture.Environment/API Details
example_trainer/model.py_initialize_meta_tensorsandget_parent_and_nameSteps to Reproduce
ModuleListstructure (e.g., Llama).AttributeErroringet_parent_and_namewhen it tries to access numeric indices.Interaction Details (if applicable)
Itemtocollect_trajectory:get_parent_and_nameshould detect numeric indices and use__getitem__.config.json.Setup Details
torch,transformersAdditional Context & Logs
This fix implements a multi-source RoPE theta detection logic that scans
rope_theta,rope_base, andrope_scalingdictionaries in the model configuration.