[Bug] RuntimeError: can't start new thread

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.

### Describe the bug

Hi Team! I have been trying to train a speculative decoding model for Qwen3-Coder-480B-A35B-Instruct-FP8, and I am having the following error:-
```
[rank0]:   File "/SpecForge/scripts/train_eagle3_sgl_online.py", line 775, in <module>
[rank0]:     main()
[rank0]:   File "/SpecForge/scripts/train_eagle3_sgl_online.py", line 771, in main
[rank0]:     trainer.train()
[rank0]:   File "/SpecForge/scripts/train_eagle3_sgl_online.py", line 699, in train
[rank0]:     data_for_draft = self.target_model.forward(
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/specforge/modeling/target/sgl_model_wrapper.py", line 253, in forward
[rank0]:     hidden_states_list, aux_hidden_states_list = self.extend(reqs)
[rank0]:                                                  ^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/specforge/modeling/target/sgl_model_wrapper.py", line 200, in extend
[rank0]:     return _extend(
[rank0]:            ^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/specforge/modeling/target/sgl_model_wrapper.py", line 86, in _extend
[rank0]:     logits_output, _ = model_runner.forward(forward_batch)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 1752, in forward
[rank0]:     output = self._forward_raw(
[rank0]:              ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 1797, in _forward_raw
[rank0]:     ret = self.forward_extend(
[rank0]:           ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 1697, in forward_extend
[rank0]:     return self.model.forward(
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 654, in forward
[rank0]:     hidden_states = self.model(
[rank0]:                     ^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/models/qwen2_moe.py", line 492, in forward
[rank0]:     hidden_states, residual = layer(
[rank0]:                               ^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 533, in forward
[rank0]:     hidden_states = self.mlp(hidden_states, forward_batch, use_reduce_scatter)
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 126, in forward
[rank0]:     return self.forward_normal(hidden_states, use_reduce_scatter)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 148, in forward_normal
[rank0]:     final_hidden_states = self.experts(hidden_states, topk_output)
[rank0]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/layers/moe/ep_moe/layer.py", line 140, in forward
[rank0]:     return self.forward_deepgemm(hidden_states, topk_output)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/layers/moe/ep_moe/layer.py", line 301, in forward_deepgemm
[rank0]:     deep_gemm_wrapper.grouped_gemm_nt_f8f8bf16_masked(
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/layers/quantization/deep_gemm_wrapper/entrypoint.py", line 51, in grouped_gemm_nt_f8f8bf16_masked
[rank0]:     with compile_utils.deep_gemm_execution_hook(
[rank0]:   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__
[rank0]:     return next(self.gen)
[rank0]:            ^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/layers/quantization/deep_gemm_wrapper/compile_utils.py", line 333, in deep_gemm_execution_hook
[rank0]:     _maybe_compile_deep_gemm_one_type_all(kernel_type, n, k, num_groups)
[rank0]:   File "/.sglang/lib/python3.12/site-packages/sglang/srt/layers/quantization/deep_gemm_wrapper/compile_utils.py", line 298, in _maybe_compile_deep_gemm_one_type_all
[rank0]:     thread_map(compile_func, collected_configs, max_workers=_COMPILE_WORKERS)
[rank0]:   File "/.sglang/lib/python3.12/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
[rank0]:     return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/.sglang/lib/python3.12/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
[rank0]:     return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 608, in map
[rank0]:     fs = [self.submit(fn, *args) for args in zip(*iterables)]
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3.12/concurrent/futures/thread.py", line 179, in submit
[rank0]:     self._adjust_thread_count()
[rank0]:   File "/usr/lib/python3.12/concurrent/futures/thread.py", line 202, in _adjust_thread_count
[rank0]:     t.start()
[rank0]:   File "/usr/lib/python3.12/threading.py", line 992, in start
[rank0]:     _start_new_thread(self._bootstrap, ())
[rank0]: RuntimeError: can't start new thread
```

Also, it starts training and fails every time at 6%
```
Training:   6%|████████████▎                                                                                                                                                                                       | 63/1000 [02:15<33:31,  2.15s/it]
[rank0]: Traceback (most recent call last):
```

My ulimits for the system are:-
```
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 7300382
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1048576
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 1048576
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited
```

Please help me out with this issue. Thanks in advance :)

### Reproduction

```
torchrun --standalone --nproc_per_node 8 \
/SpecForge/scripts/train_eagle3_sgl_online.py \
  --target-model-path Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  --model-path Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  --draft-model-config /SpecForge/configs/qwen3-coder-480B-A35B-instruct-eagle3.json \
  --train-data-path /SpecForge/data/apps_train.jsonl \
  --eval-data-path /SpecForge/data/apps_eval.jsonl \
  --tp-size 8 \
  --ep-size 8 \
  --output-dir /SpecForge/outputs/qwen3-coder-480B-A35B-eagle3 \
  --num-epochs 1 \
  --batch-size 1 \
  --learning-rate 5e-5 \
  --draft-attention-backend flex_attention \
  --max-length 2048 \
  --chat-template qwen \
  --cache-dir /SpecForge/cache \
  --mem-frac=0.7 \
  --dist-timeout 3600 \
  --watchdog-timeout 1800 \
  --disable-cuda-graph
```

### Environment

I am using the the source installation:-
```
git clone https://github.com/sgl-project/SpecForge.git

pip install -v .
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] RuntimeError: can't start new thread #264

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] RuntimeError: can't start new thread #264

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions