-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Description
🐛 Describe the bug
When running the following command in a Docker environment, we encounter an OmegaConf interpolation error because the USER environment variable is not set:
Command
uv run python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
Error
...
File "/workspace/torchforge/.venv/lib/python3.12/site-packages/omegaconf/omegaconf.py", line 458, in resolver_wrapper
ret = resolver(*args, **kwargs)
File "/workspace/torchforge/.venv/lib/python3.12/site-packages/omegaconf/resolvers/oc/__init__.py", line 38, in env
raise KeyError(f"Environment variable '{key}' not found")
omegaconf.errors.InterpolationResolutionError: KeyError raised while resolving interpolation: "Environment variable 'USER' not found"
full_key: metric_logging.wandb.group
object_type: dictProposed fix
Using a default value for USER prevents failures in environments where it’s unset (e.g., many containers). This change worked for me, and I can open a PR to apply it to other config files if desired.
diff --git a/apps/grpo/qwen3_1_7b.yaml b/apps/grpo/qwen3_1_7b.yaml
index f7b06eb8..10e66148 100644
--- a/apps/grpo/qwen3_1_7b.yaml
+++ b/apps/grpo/qwen3_1_7b.yaml
@@ -18,7 +18,7 @@ rollout_threads: 1 # Recommended to set equal to generator.num_replicas
metric_logging:
wandb:
project: grpo-training
- group: grpo_exp_${oc.env:USER}
+ group: grpo_exp_${oc.env:USER,default_user}
logging_mode: global_reduce # global_reduce, per_rank_reduce, per_rank_no_reduce
console:
logging_mode: global_reduce
Environment
git log -1
commit cd9e295c49b2a1a6e07eea2d77fa295613729638 (HEAD -> main, origin/main, origin/HEAD)
Author: Jiyue Wang <[email protected]>
Date: Wed Jan 28 16:40:10 2026 -0500
[vllm] Upgrade vllm version to v0.13.0 (#737)
# Check core components
python -c "import torch, forge, monarch, vllm; print('All imports successful')"
# Check specific versions
python -c "
import torch
import forge
import vllm
print(f'PyTorch: {torch.__version__}')
print(f'TorchForge: {forge.__version__}')
print(f'vLLM: {vllm.__version__}')
print(f'CUDA: {torch.version.cuda}')
"
All imports successful
PyTorch: 2.9.0+cu128
TorchForge:
vLLM: 0.13.0
CUDA: 12.8Versions
No response
Metadata
Metadata
Assignees
Labels
No labels