[Bug]  distrbute --use_ddp=true timeout with error 1/4 clients joined.

### Describe the bug

python -m TTS.bin.train_tts --config_path finetune_config.json --restore_path /home/user/.local/share/tts/tts_models--fa--custom--glow-tts/model_file.pth --use_ddp=true --gpus="0,1,2,3"
Found 24005 files in /home/user/workspace/dataset/train-tts3/dataset
Using model: glow_tts
Setting up Audio Processor...
 | sample_rate: 22050
 | resample: False
 | num_mels: 80
 | log_func: np.log10
 | min_level_db: -100
 | frame_shift_ms: None
 | frame_length_ms: None
 | ref_level_db: 20
 | fft_size: 1024
 | power: 1.5
 | preemphasis: 0.0
 | griffin_lim_iters: 60
 | signal_norm: True
 | symmetric_norm: True
 | mel_fmin: 0
 | mel_fmax: None
 | pitch_fmin: 1.0
 | pitch_fmax: 640.0
 | spec_gain: 20.0
 | stft_pad_mode: reflect
 | max_norm: 4.0
 | clip_norm: True
 | do_trim_silence: True
 | trim_db: 45
 | do_sound_norm: False
 | do_amp_to_db_linear: True
 | do_amp_to_db_mel: True
 | do_rms_norm: False
 | db_level: None
 | stats_path: None
 | base: 10
 | hop_length: 256
 | win_length: 1024
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /)
 > Training Environment:
 | > Backend: Torch
 | > Mixed precision: True
 | > Precision: fp16
 | > Current device: 0
 | > Num. of GPUs: 4
 | > Num. of CPUs: 48
 | > Num. of Torch Threads: 24
 | > Torch seed: 54321
 | > Torch CUDNN: True
 | > Torch CUDNN deterministic: False
 | > Torch CUDNN benchmark: False
 | > Torch TF32 MatMul: False
 > Start Tensorboard: tensorboard --logdir=glowtts_persian_finetune-March-07-2025_01+37AM-0000000
 > Using PyTorch DDP
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in runmodule_as_main
  File "<frozen runpy>", line 88, in runcode
  File "/home/user/workspace/dataset/coqui-ai-TTS/TTS/bin/train_tts.py", line 77, in <module>
    main()
  File "/home/user/workspace/dataset/coqui-ai-TTS/TTS/bin/train_tts.py", line 63, in main
    trainer = Trainer(
              ^^^^^^^^
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/trainer/trainer.py", line 310, in init
    init_distributed(
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/trainer/utils/distributed.py", line 65, in init_distributed
    dist.init_process_group(
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
    return func(args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 95, in wrapper
    func_return = func(args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1714, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 226, in tcprendezvous_handler
    store = createc10d_store(
            ^^^^^^^^^^^^^^^^^^^
  File "/home/user/workspace/dataset/TTS/venv/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 194, in createc10d_store
    return TCPStore(
           ^^^^^^^^^
torch.distributed.DistStoreError: Timed out after 601 seconds waiting for clients. 1/4 clients joined.
 cat finetune_config.json                                {                                          
                                                                            "run_name": "glowtts_persian_finetune",
    "model": "glow_tts",
    "batch_size": 8,
    "eval_batch_size": 4,
    "num_loader_workers": 4,
    "num_eval_loader_workers": 4,
    "run_eval": true,
    "test_delay_epochs": 5,
    "epochs": 1000,
    "text_cleaner": "phoneme_cleaners",
    "use_phonemes": true,
    "phoneme_language": "fa",
    "phoneme_cache_path": "ph_cache",
    "enable_eos_bos_chars": false,
    "precompute_num_workers": 4,
    "print_step": 10,
    "print_eval": true,
    "mixed_precision": true,
    "output_path": "./",
    "lr": 0.0001,
    "characters": {
        "characters_class": "TTS.tts.utils.text.characters.IPAPhonemes",
        "vocabdict": null,
        "pad": "<PAD>",
        "eos": "<EOS>",
        "bos": "<BOS>",
        "blank": "<BLNK>",
        "characters": "\u02c8\u02cc\u02d0\u02d1pbtd\u0288\u0256c\u025fk\u0261q\u0262\u0294\u0274\u014b\u0272\u0273n\u0271m\u0299r\u0280\u2c71\u027e\u027d\u0278\u03b2fv\u03b8\u00f0sz\u0283\u0292\u0282\u0290\u00e7\u029dx\u0263\u03c7\u0281\u0127\u0295h\u0266\u026c\u026e\u028b\u0279\u027bj\u0270l\u026d\u028e\u029faegiouwy\u026a\u028a\u0329\u00e6\u0251\u0254\u0259\u025a\u025b\u025d\u0268\u0303\u0289\u028c\u028d0123456789\"#$%*+/=ABCDEFGHIJKLMNOPRSTUVWXYZ[]^{}",
        "punctuations": "!(),-.:;? \u0320\u060c\u061b\u061f\u200c<>",
        "phonemes": "\u02c8\u02cc\u02d0\u02d1pbtd\u0288\u0256c\u025fk\u0261q\u0262\u0294\u0274\u014b\u0272\u0273n\u0271m\u0299r\u0280\u2c71\u027e\u027d\u0278\u03b2fv\u03b8\u00f0sz\u0283\u0292\u0282\u0290\u00e7\u029dx\u0263\u03c7\u0281\u0127\u0295h\u0266\u026c\u026e\u028b\u0279\u027bj\u0270l\u026d\u028e\u029faegiouwy\u026a\u028a\u0329\u00e6\u0251\u0254\u0259\u025a\u025b\u025d\u0268\u0303\u0289\u028c\u028d0123456789\"#$%*+/=ABCDEFGHIJKLMNOPRSTUVWXYZ[]^_{}",
        "is_unique": true,
        "is_sorted": true
    },
    "datasets": [
        {
            "formatter": "ljspeech",
            "path": "./dataset/",
            "meta_file_train": "tts_dataset.csv",
            "ignored_speakers": []
  }
    ],
    "test_sentences": [
        "\u0633\u0644\u0637\u0627\u0646 \u0645\u062d\u0645\u0648\u062f \u062f\u0631 \u0632\u0645\u0633\u062a\u0627\u0646\u06cc \u0633\u062e\u062a \u0628\u0647 \u0637\u0644\u062e\u06a9 \u06af\u0641\u062a \u06a9\u0647: \u0628\u0627 \u0627\u06cc\u0646 \u062c\u0627\u0645\u0647 \u06cc \u06cc\u06a9 \u0644\u0627 \u062f\u0631 \u0627\u06cc\u0646 \u0633\u0631\u0645\u0627 \u0686\u0647 \u0645\u06cc \u06a9\u0646\u06cc ",
        "\u0645\u0631\u062f\u06cc \u0646\u0632\u062f \u0628\u0642\u0627\u0644\u06cc \u0622\u0645\u062f \u0648 \u06af\u0641\u062a \u067e\u06cc\u0627\u0632 \u0647\u0645 \u062f\u0647 \u062a\u0627 \u062f\u0647\u0627\u0646 \u0628\u062f\u0627\u0646 \u062e\u0648 \u0634\u0628\u0648\u06cc \u0633\u0627\u0632\u0645.",
        "\u0627\u0632 \u0645\u0627\u0644 \u062e\u0648\u062f \u067e\u0627\u0631\u0647 \u0627\u06cc \u06af\u0648\u0634\u062a \u0628\u0633\u062a\u0627\u0646 \u0648 \u0632\u06cc\u0631\u0647 \u0628\u0627\u06cc\u06cc \u0645\u0639\u0637\u0651\u0631 \u0628\u0633\u0627\u0632",
        "\u06cc\u06a9 \u0628\u0627\u0631 \u0647\u0645 \u0627\u0632 \u062c\u0647\u0646\u0645 \u0628\u06af\u0648\u06cc\u06cc\u062f.",
        "\u06cc\u06a9\u06cc \u0627\u0633\u0628\u06cc \u0628\u0647 \u0639\u0627\u0631\u06cc\u062a \u062e\u0648\u0627\u0633\u062a"
    ]
}

### To Reproduce



python -m TTS.bin.train_tts --config_path finetune_config.json --restore_path /home/user/.local/share/tts/tts_models--fa--custom--glow-tts/model_file.pth --use_ddp=true --gpus="0,1,2,3"

### Expected behavior

_No response_

### Logs

```shell

```

### Environment

```shell
pip freeze | grep TTS
-e git+https://github.com/idiap/coqui-ai-TTS.git@4c593c620854d9cd2e177382abf48082f7c9f2ae#egg=coqui_tts
pip freeze | grep torch
torch==2.6.0
torchaudio==2.6.0
```

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] distrbute --use_ddp=true timeout with error 1/4 clients joined. #152

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] distrbute --use_ddp=true timeout with error 1/4 clients joined. #152

Description

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions