Skip to content

Environment variable setup issue ("Environment variable 'USER' not found") in Docker #742

@insop

Description

@insop

🐛 Describe the bug

When running the following command in a Docker environment, we encounter an OmegaConf interpolation error because the USER environment variable is not set:

Command

uv run python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml

Error

  ...
  File "/workspace/torchforge/.venv/lib/python3.12/site-packages/omegaconf/omegaconf.py", line 458, in resolver_wrapper
    ret = resolver(*args, **kwargs)
  File "/workspace/torchforge/.venv/lib/python3.12/site-packages/omegaconf/resolvers/oc/__init__.py", line 38, in env
    raise KeyError(f"Environment variable '{key}' not found")
  omegaconf.errors.InterpolationResolutionError: KeyError raised while resolving interpolation: "Environment variable 'USER' not found"
  full_key: metric_logging.wandb.group
  object_type: dict

Proposed fix
Using a default value for USER prevents failures in environments where it’s unset (e.g., many containers). This change worked for me, and I can open a PR to apply it to other config files if desired.

diff --git a/apps/grpo/qwen3_1_7b.yaml b/apps/grpo/qwen3_1_7b.yaml
index f7b06eb8..10e66148 100644
--- a/apps/grpo/qwen3_1_7b.yaml
+++ b/apps/grpo/qwen3_1_7b.yaml
@@ -18,7 +18,7 @@ rollout_threads: 1   # Recommended to set equal to generator.num_replicas
 metric_logging:
   wandb:
     project: grpo-training
-    group: grpo_exp_${oc.env:USER}
+    group: grpo_exp_${oc.env:USER,default_user}
     logging_mode: global_reduce # global_reduce, per_rank_reduce, per_rank_no_reduce
   console:
     logging_mode: global_reduce

Environment

git log -1
commit cd9e295c49b2a1a6e07eea2d77fa295613729638 (HEAD -> main, origin/main, origin/HEAD)
Author: Jiyue Wang <[email protected]>
Date:   Wed Jan 28 16:40:10 2026 -0500

    [vllm] Upgrade vllm version to v0.13.0 (#737)

# Check core components
python -c "import torch, forge, monarch, vllm; print('All imports successful')"

# Check specific versions
python -c "
import torch
import forge
import vllm

print(f'PyTorch: {torch.__version__}')
print(f'TorchForge: {forge.__version__}')
print(f'vLLM: {vllm.__version__}')
print(f'CUDA: {torch.version.cuda}')
"
All imports successful
PyTorch: 2.9.0+cu128
TorchForge: 
vLLM: 0.13.0
CUDA: 12.8

Versions

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions