Skip to content

Hardening: Fix Model Initialization Crashes and RoPE Theta Desync #455

@RUFFY-369

Description

@RUFFY-369

Describe the Issue

This issue addresses two critical defects in the model initialization logic within example_trainer/model.py:

  1. Meta Tensor Traversal Bug: The get_parent_and_name helper was incorrectly using getattr for numeric parts of a tensor name (e.g., layers.0.weight). Since ModuleLists do not support string-named attributes for indices, this caused model initialization to crash for every transformer architecture.
  2. RoPE Theta Desync: The framework hardcoded the RoPE base frequency to 10,000. Modern models like Llama 3 (500k) and Qwen (1M+) require significantly higher base frequencies. Using the wrong theta causes a complete breakdown of positional understanding at long contexts.

Environment/API Details

  • Environment Class/Name: example_trainer/model.py
  • Environment Configuration: Any configuration using Llama 3, Qwen, or other high-context models.
  • API Endpoint/Method Involved: _initialize_meta_tensors and get_parent_and_name

Steps to Reproduce

  1. Initialize a model with a ModuleList structure (e.g., Llama).
  2. Observe AttributeError in get_parent_and_name when it tries to access numeric indices.
  3. If manually bypassed, observe "stuttering" or gibberish output on contexts > 2048 due to theta mismatch.

Interaction Details (if applicable)

  • Input Item to collect_trajectory:
    # Example of a broken path that fails:
    full_name = "model.layers.0.self_attn.q_proj.weight"
    # getattr(model.layers, "0") fails
  • Expected Behavior:
    1. get_parent_and_name should detect numeric indices and use __getitem__.
    2. RoPE theta should be dynamically detected from the model's config.json.

Setup Details

  • OS: Linux
  • Python Version: 3.10+
  • Atropos Version: commit c20c852
  • Relevant Libraries/Versions: torch, transformers

Additional Context & Logs

This fix implements a multi-source RoPE theta detection logic that scans rope_theta, rope_base, and rope_scaling dictionaries in the model configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions