Skip to content

[Bug] ValueError: not allowed to raise maximum limit (rlimit) #110

@iamkhalidbashir

Description

@iamkhalidbashir

Describe the bug

Error while training:-

  • I tried with sudo same error
  • I am using docker image nvidia/cuda:11.7.0-base-ubuntu22.04
  • The default value of the docker container for command resource.getrlimit(resource.RLIMIT_NOFILE) is (1048576, 1048576)
| > stats_path:None
2023-06-14T07:29:43.025431079Z  | > base:10
2023-06-14T07:29:43.025437149Z  | > hop_length:256
2023-06-14T07:29:43.025444429Z  | > win_length:1024
2023-06-14T07:29:43.025450699Z  > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z   File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z     trainer = Trainer(
2023-06-14T07:29:43.025484659Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z     self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z     resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit

Due this line:

Trainer/trainer/trainer.py

Lines 653 to 660 in 9879d3d

if platform.system() != "Windows":
# https://github.com/pytorch/pytorch/issues/973
import resource # pylint: disable=import-outside-toplevel
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
# set and initialize Pytorch runtime

To Reproduce

  1. Install coqui-tts in nvidia/cuda:11.7.0-base-ubuntu22.04 docker container
  2. Try train vits model
  3. This error is throw (even with sudo)

Expected behavior

No errors

Logs

| > stats_path:None
2023-06-14T07:29:43.025431079Z  | > base:10
2023-06-14T07:29:43.025437149Z  | > hop_length:256
2023-06-14T07:29:43.025444429Z  | > win_length:1024
2023-06-14T07:29:43.025450699Z  > initialization of speaker-embedding layers.
2023-06-14T07:29:43.025462919Z Traceback (most recent call last):
2023-06-14T07:29:43.025469199Z   File "/workspace/coqui-tts/train.py", line 320, in <module>
2023-06-14T07:29:43.025476859Z     trainer = Trainer(
2023-06-14T07:29:43.025484659Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 405, in __init__
2023-06-14T07:29:43.025494939Z     self.use_cuda, self.num_gpus = self.setup_training_environment(args=args, config=config, gpu=gpu)
2023-06-14T07:29:43.025500099Z   File "/usr/local/lib/python3.10/dist-packages/trainer/trainer.py", line 632, in setup_training_environment
2023-06-14T07:29:43.025543959Z     resource.setrlimit(resource.RLIMIT_NOFILE, (4096, rlimit[1]))
2023-06-14T07:29:43.025560229Z ValueError: not allowed to raise maximum limit

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla V100-FHHL-16GB"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.1+cu117",
        "Trainer": "v0.0.20",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.6",
        "version": "#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020"
    }
}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions