Networking error in Docker due to host IP detection (workaround: set VLLM_HOST_IP)

### 🐛 Describe the bug

**Description**

In a Docker environment, running the command below triggers networking errors (IPv6 address chosen, IPv4 expected).

**Command**

```bash
uv run python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
```

**Workaround**

Based on the logic in `monarch_executor.py`, the host IP can be overridden via an environment variable:
- https://github.com/meta-pytorch/torchforge/blob/cd9e295c49b2a1a6e07eea2d77fa295613729638/src/forge/actors/vllm/v1/monarch_executor.py#L25

```python
if host_ip := os.environ.get("VLLM_HOST_IP"):
    return host_ip
```

Setting the following **resolves** the issue in my environment:

```bash
export VLLM_HOST_IP=127.0.0.1
```

A more robust `_get_host_ip()` (e.g., preferring IPv4 or avoiding link-local IPv6 addresses in containers) could help. I'm happy to open a PR if that would be useful.

**Error message**

```
  (EngineCore_DP0 pid=8447) [2026-01-29 01:26:35] INFO monarch_executor.py:386: [actor=<root>]                 
  [MonarchExecutor] Head node: fe80::222:48ff:fe49:ba90:51391                                                  
  (EngineCore_DP0 pid=8447) [2026-01-29 01:26:35] INFO monarch_executor.py:393: [actor=<root>]                 
  [MonarchExecutor] Using allocated GPUs: ['1']                                                                
  WARNING 01-29 01:26:43 [worker_base.py:301]                                                                  
  [actor=<root>.<forge.actors.vllm.v1.forge_executor.ForgeWorkerWrapper vllm_workers{'procs': 0/1}>]           
  Missing `shared_worker_lock` argument from executor. This argument is needed for                             
  mm_processor_cache_type='shm'.                                                                               
  INFO 01-29 01:26:47 [parallel_state.py:1203]                                                                 
  [actor=<root>.<forge.actors.vllm.v1.forge_executor.ForgeWorkerWrapper vllm_workers{'procs': 0/1}>]           
  world_size=1 rank=0 local_rank=0 distributed_init_method=env:// backend=nccl                                 
  [W129 01:26:47.186735869 socket.cpp:767] [c10d] The client socket has failed to connect to                   
  [train16node-master]:51391 (errno: 22 - Invalid argument).                                                   
  [W129 01:26:47.186767930 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:26:47.876868742 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:26:48.613986049 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:26:49.280112312 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:26:51.628231771 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:26:54.085405747 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:26:59.063519507 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).                      
  [W129 01:27:07.065667397 socket.cpp:767] [c10d] The IPv4 network addresses of (fe80::222:48ff:fe49:ba90,     
  51391) cannot be retrieved (gai error: -9 - Address family for hostname not supported).  
```

**Environment**

```bash
git log -1
commit cd9e295c49b2a1a6e07eea2d77fa295613729638 (HEAD -> main, origin/main, origin/HEAD)
Author: Jiyue Wang <JenniferWang@users.noreply.github.com>
Date:   Wed Jan 28 16:40:10 2026 -0500

    [vllm] Upgrade vllm version to v0.13.0 (#737)

# Check core components
python -c "import torch, forge, monarch, vllm; print('All imports successful')"

# Check specific versions
python -c "
import torch
import forge
import vllm

print(f'PyTorch: {torch.__version__}')
print(f'TorchForge: {forge.__version__}')
print(f'vLLM: {vllm.__version__}')
print(f'CUDA: {torch.version.cuda}')
"
All imports successful
PyTorch: 2.9.0+cu128
TorchForge: 
vLLM: 0.13.0
CUDA: 12.8
```

### Versions

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Networking error in Docker due to host IP detection (workaround: set VLLM_HOST_IP) #743

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Networking error in Docker due to host IP detection (workaround: set VLLM_HOST_IP) #743

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions