Failure in loading Deepspeed large model example

### 🐛 Describe the bug

I am trying to follow the example to perform inference with the OPT-30B model according to this example: https://github.com/pytorch/serve/tree/master/examples/large_models/deepspeed

However, as specified in the [model-config.yaml](https://github.com/pytorch/serve/blob/master/examples/large_models/deepspeed/opt/model-config.yaml) file, a `checkpoints.json` file is required. This file gets used here: https://github.com/pytorch/serve/blob/master/ts/handler_utils/distributed/deepspeed.py#L40

As a result, the model fails to load. The error logs are attached below.

### Error logs

```
2023-09-05T23:22:14,652 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Failed to load model opt, exception Cannot copy out of meta tensor; no data!
2023-09-05T23:22:14,652 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/ts/model_service_worker.py", line 131, in load_model
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     service = model_loader.load(
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/ts/model_loader.py", line 135, in load
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     initialize_fn(service.context)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/home/model-server/tmp/models/c1130e4b01c345b9be913ef8414518cb/custom_handler.py", line 55, in initialize
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     ds_engine = get_ds_engine(self.model, ctx)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/ts/handler_utils/distributed/deepspeed.py", line 35, in get_ds_engine
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     ds_engine = deepspeed.init_inference(
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py", line 342, in init_inference
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     engine = InferenceEngine(model, config=ds_inference_config)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 154, in __init__
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     self.module.to(device)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2053, in to
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     return super().to(*args, **kwargs)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     return self._apply(convert)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     module._apply(fn)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     module._apply(fn)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     module._apply(fn)
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
2023-09-05T23:22:14,653 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     param_applied = fn(param)
2023-09-05T23:22:14,654 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
2023-09-05T23:22:14,654 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG -     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2023-09-05T23:22:14,654 [INFO ] W-29500-opt_1.0-stdout MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data!
```

### Installation instructions

Docker image URI: `763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.0.1-gpu-py310-cu118-ubuntu20.04-ec2`
EC2 instance: `g5dn.24xlarge`

### Model Packaing

Created model artifact by following this example:
https://github.com/pytorch/serve/tree/master/examples/large_models/deepspeed

### config.properties

_No response_

### Versions

```
------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.8.1
torch-model-archiver==0.8.1

Python version: 3.10 (64-bit runtime)
Python executable: /opt/conda/bin/python3

Versions of relevant python libraries:
captum==0.6.0
numpy==1.22.4
nvgpu==0.10.0
psutil==5.9.5
requests==2.31.0
torch==2.0.1+cu118
torch-model-archiver==0.8.1
torchaudio==2.0.2+cu118
torchdata==0.6.1+cu118
torchserve==0.8.1
torchtext==0.15.2+cu118
torchvision==0.15.2+cu118
wheel==0.38.4
torch==2.0.1+cu118
torchtext==0.15.2+cu118
torchvision==0.15.2+cu118
torchaudio==2.0.2+cu118

Java Version:


OS: N/A
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.27.2

Is CUDA available: Yes
CUDA runtime version: 11.8.89
GPU models and configuration: 
GPU 0: NVIDIA A10G
GPU 1: NVIDIA A10G
GPU 2: NVIDIA A10G
GPU 3: NVIDIA A10G
Nvidia driver version: 535.54.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.0
```

### Repro instructions

Please follow the instructions as mentioned here to reproduce this error: https://github.com/pytorch/serve/tree/master/examples/large_models/deepspeed

### Possible Solution

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure in loading Deepspeed large model example #2569

🐛 Describe the bug

Error logs

Installation instructions

Model Packaing

config.properties

Versions

Repro instructions

Possible Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failure in loading Deepspeed large model example #2569

Description

🐛 Describe the bug

Error logs

Installation instructions

Model Packaing

config.properties

Versions

Repro instructions

Possible Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions