Describe the Issue
The environment scripts, specifically environments/gsm8k_server_teacher_distill.py, had hardcoded model and tokenizer strings (pointing to NousResearch/DeepHermes-3-Llama-3-3B-Preview).
This hardcoding prevents the framework from being easily used with other model architectures (e.g., Llama 3.1, Qwen, or custom checkpoints) without manual source code modifications. If a user attempts to use a different model while the tokenizer remains hardcoded to DeepHermes, it can result in TokenId out of range errors, incorrect prompt formatting, or silent performance degradation due to vocabulary mismatch.
Environment/API Details
- Environment Class/Name:
environments/gsm8k_server_teacher_distill.py
- Environment Configuration:
GSM8kTeacherDistillEnv
- API Endpoint/Method Involved:
config_init and teacher_config_init
Steps to Reproduce
- Attempt to run a training task with a non-DeepHermes model (e.g., Llama-3-8B).
- Observe that the environment still tries to load the DeepHermes tokenizer.
- Observe potential crashes during decoding or evaluation if the vocabularies differ.
Interaction Details (if applicable)
- Expected Behavior:
- The environment should allow overriding the student/teacher models and tokenizers via environment variables (e.g.,
STUDENT_MODEL, TEACHER_MODEL).
- The configuration should support dynamic resolution to ensure the tokenizer always matches the model being served.
Setup Details
- OS: Linux
- Python Version: 3.10+
- Atropos Version: commit c20c852
Additional Context & Logs
This refactoring makes the Atropos environment suite truly model-agnostic, allowing for rapid experimentation across different model families without brittle source-code patches.
Describe the Issue
The environment scripts, specifically
environments/gsm8k_server_teacher_distill.py, had hardcoded model and tokenizer strings (pointing toNousResearch/DeepHermes-3-Llama-3-3B-Preview).This hardcoding prevents the framework from being easily used with other model architectures (e.g., Llama 3.1, Qwen, or custom checkpoints) without manual source code modifications. If a user attempts to use a different model while the tokenizer remains hardcoded to DeepHermes, it can result in
TokenId out of rangeerrors, incorrect prompt formatting, or silent performance degradation due to vocabulary mismatch.Environment/API Details
environments/gsm8k_server_teacher_distill.pyGSM8kTeacherDistillEnvconfig_initandteacher_config_initSteps to Reproduce
Interaction Details (if applicable)
STUDENT_MODEL,TEACHER_MODEL).Setup Details
Additional Context & Logs
This refactoring makes the Atropos environment suite truly model-agnostic, allowing for rapid experimentation across different model families without brittle source-code patches.