Skip to content

Model and data downloading #550




Hi I am trying this guide
I have set up the environment but I didn't find in the document what models and datasets I need to download and where to store them. The connection is slow and storage is limited so I wanna prepare everything in advance. So, how can I make the script to load models from local path?????

I have prepared opt-1.3b and opt-350m and Dahoas dataset, and they are under the project directory.

DeepSpeedExamples/applications/DeepSpeed-Chat# ls Dahoas facebook

opt-1.3b  opt-350m

but when I ran python --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu --step 2 3 the script is still trying to download it .. and returned connection error...

[2023-05-26 10:47:55,167] [WARNING] [] Unable to find hostfile, will proceed with training with local resources only.
[2023-05-26 10:47:55,197] [INFO] [] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr= --master_port=29500 --enable_each_rank_log=None --model_name_or_path facebook/opt-350m --num_padding_at_beginning 1 --weight_decay 0.1 --disable_dropout --gradient_accumulation_steps 4 --zero_stage 0 --deepspeed --output_dir /workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/output/reward-models/350m
[2023-05-26 10:47:56,698] [INFO] [] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.9.9-1+cuda11.3
[2023-05-26 10:47:56,698] [INFO] [] 0 NCCL_VERSION=2.9.9-1
[2023-05-26 10:47:56,698] [INFO] [] 0 NV_LIBNCCL_PACKAGE_VERSION=2.9.9-1
[2023-05-26 10:47:56,698] [INFO] [] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.9.9-1+cuda11.3
[2023-05-26 10:47:56,698] [INFO] [] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-05-26 10:47:56,698] [INFO] [] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-05-26 10:47:56,698] [INFO] [] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.9.9-1
[2023-05-26 10:47:56,698] [INFO] [] WORLD INFO DICT: {'localhost': [0]}
[2023-05-26 10:47:56,698] [INFO] [] nnodes=1, num_local_procs=1, node_rank=0
[2023-05-26 10:47:56,698] [INFO] [] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-05-26 10:47:56,698] [INFO] [] dist_world_size=1
[2023-05-26 10:47:56,698] [INFO] [] Setting CUDA_VISIBLE_DEVICES=0
[2023-05-26 10:47:58,770] [INFO] [] Initializing TorchBackend in DeepSpeed with backend nccl
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/", line 259, in hf_raise_for_status
  File "/opt/conda/lib/python3.7/site-packages/requests/", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/transformers/utils/", line 429, in cached_file
  File "/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/huggingface_hub/", line 1199, in hf_hub_download
  File "/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/huggingface_hub/", line 1541, in get_hf_file_metadata
  File "/opt/conda/lib/python3.7/site-packages/huggingface_hub/utils/", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64708e60-3fe7220d286716304dafaa08)

Repository Not Found for url:
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 352, in <module>
  File "", line 204, in main
    tokenizer = load_hf_tokenizer(args.model_name_or_path, fast_tokenizer=True)
  File "/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/training/utils/", line 53, in load_hf_tokenizer
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/auto/", line 659, in from_pretrained
    pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/auto/", line 928, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/", line 641, in _get_config_dict
  File "/opt/conda/lib/python3.7/site-packages/transformers/utils/", line 434, in cached_file
    f"{path_or_repo_id} is not a local folder and is not a valid model identifier "
OSError: opt-350m is not a local folder and is not a valid model identifier listed on ''
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
[2023-05-26 10:48:01,710] [INFO] [] Killing subprocess 334
[2023-05-26 10:48:01,711] [ERROR] [] ['/opt/conda/bin/python', '-u', '', '--local_rank=0', '--model_name_or_path', 'facebook/opt-350m', '--num_padding_at_beginning', '1', '--weight_decay', '0.1', '--disable_dropout', '--gradient_accumulation_steps', '4', '--zero_stage', '0', '--deepspeed', '--output_dir', '/workspace/DeepSpeedExamples/applications/DeepSpeed-Chat/output/reward-models/350m'] exits with return code = 1

It also happened when I ran step 1 earlier... but later it somehow became fine and succesfully proceeded... but now with step 2 and 3 the same issue happened again...
May I know how to get around with it?




No one assigned


    deespeed chatDeepSpeed ChatquestionFurther information is requested


    No type


    No projects


    No milestone


    None yet


    No branches or pull requests

    Issue actions