Skip to content

Commit 5932fb9

Browse files
committed
add READMEs
1 parent 7306a43 commit 5932fb9

File tree

10 files changed

+106
-16
lines changed

10 files changed

+106
-16
lines changed

inference/mii/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
Install the requirements by running `pip install -r requirements.txt`.
44

5-
Once [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) is installed you have two options for deployment: an interactive non-persistent pipeline or a persistent serving deployment. For details on these files please refer to the [Getting Started guide for MII](https://github.com/microsoft/deepspeed-mii#getting-started-with-mii).
5+
Once [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) is installed you have two options for deployment: an interactive non-persistent pipeline or a persistent serving deployment. See the scripts in [non-persistent](./non-persistent/) and [persistent](./persistent/) for examples. Details on the code implemented in these scripts can be found on our [Getting Started guide for MII](https://github.com/microsoft/deepspeed-mii#getting-started-with-mii).
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Non-Persistent Pipeline Examples
2+
3+
The `pipeline.py` script can be used to run any of the [supported
4+
models](https://github.com/microsoft/DeepSpeed-mii#supported-models). Provide
5+
the HuggingFace model name, maximum generated tokens, and prompt(s). The
6+
generated responses will be printed in the terminal:
7+
8+
```shell
9+
$ python pipeline.py --model "mistralai/Mistral-7B-v0.1" --max-new-tokens 128 --prompts "DeepSpeed is" "Seattle is"
10+
```
11+
12+
Tensor-parallelism can be controlled using the `deepspeed` launcher and setting
13+
`--num_gpus`:
14+
15+
```shell
16+
$ deepspeed --num_gpus 2 pipeline.py
17+
```
18+
19+
## Model-Specific Examples
20+
21+
For convenience, we also provide a set of scripts to quickly test the MII
22+
Pipeline with some popular text-generation models:
23+
24+
| Model | Launch command |
25+
|-------|----------------|
26+
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b) | `$ python llama2.py` |
27+
| [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) | `$ python falcon.py` |
28+
| [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | `$ deepspeed --num_gpus 2 mixtral.py` |
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
from mii import pipeline
1+
import mii
22

3-
pipe = pipeline("tiiuae/falcon-7b")
3+
pipe = mii.pipeline("tiiuae/falcon-7b")
44
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True)
55
if pipe.is_rank_0:
66
print(responses[0])
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
from mii import pipeline
1+
import mii
22

3-
pipe = pipeline("meta-llama/Llama-2-7b-hf")
3+
pipe = mii.pipeline("meta-llama/Llama-2-7b-hf")
44
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True)
55
if pipe.is_rank_0:
66
print(responses[0])
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
from mii import pipeline
1+
import mii
22

3-
pipe = pipeline("mistralai/Mixtral-8x7B-v0.1")
3+
pipe = mii.pipeline("mistralai/Mixtral-8x7B-v0.1")
44
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True)
55
if pipe.is_rank_0:
66
print(responses[0])

inference/mii/non-persistent/pipeline.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
11
import argparse
2-
from mii import pipeline
2+
import mii
33

44
parser = argparse.ArgumentParser()
55
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1")
6-
parser.add_argument("--prompts", type=str, nargs="+", default=["DeepSpeed is"])
6+
parser.add_argument(
7+
"--prompts", type=str, nargs="+", default=["DeepSpeed is", "Seattle is"]
8+
)
9+
parser.add_argument("--max-new-tokens", type=int, default=128)
710
args = parser.parse_args()
811

9-
pipe = pipeline(parser.model)
10-
responses = pipe(args.prompts, max_new_tokens=128, return_full_text=True)
12+
pipe = mii.pipeline(args.model)
13+
responses = pipe(
14+
args.prompts, max_new_tokens=args.max_new_tokens, return_full_text=True
15+
)
1116

1217
if pipe.is_rank_0:
1318
for r in responses:

inference/mii/persistent/README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Persistent Deployment Examples
2+
3+
The `serve.py` script can be used to create an inference server for any of the
4+
[supported models](https://github.com/microsoft/DeepSpeed-mii#supported-models).
5+
Provide the HuggingFace model name and tensor-parallelism (use the default
6+
values and run `$ python serve.py` for a single-GPU
7+
[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
8+
deployment):
9+
10+
```shell
11+
$ python serve.py --model "mistralai/Mistral-7B-v0.1" tensor-parallel 1
12+
```
13+
14+
Connect to the persistent deployment and generate text with `client.py`. Provide
15+
the HuggingFace model name, maximum generated tokens, and prompt(s) (or if you
16+
are using the default values, run `$ python client.py`):
17+
18+
```shell
19+
$ python client.py --model "mistralai/Mistral-7B-v0.1" --max-new-tokens 128 --prompts "DeepSpeed is" "Seattle is"
20+
```
21+
22+
Shutdown the persistent deployment with `terminate.py`. Provide the HuggingFace
23+
model name (or if you are using the default values, run `$ python
24+
terminate.py`):
25+
26+
```shell
27+
$ python terminate.py --model "mistralai/Mistral-7B-v0.1
28+
```

inference/mii/persistent/client.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,18 @@
1+
import argparse
12
import mii
23

3-
client = mii.client("mistralai/Mistral-7B-v0.1")
4-
output = client.generate("Deepspeed is", max_new_tokens=128)
4+
parser = argparse.ArgumentParser()
5+
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1")
6+
parser.add_argument(
7+
"--prompts", type=str, nargs="+", default=["DeepSpeed is", "Seattle is"]
8+
)
9+
parser.add_argument("--max-new-tokens", type=int, default=128)
10+
args = parser.parse_args()
511

6-
print(output)
12+
client = mii.client(args.model)
13+
responses = client(
14+
args.prompts, max_new_tokens=args.max_new_tokens, return_full_text=True
15+
)
16+
17+
for r in responses:
18+
print(r, "\n", "-" * 80, "\n")

inference/mii/persistent/serve.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
import argparse
12
import mii
23

3-
mii.serve("mistralai/Mistral-7B-v0.1")
4+
parser = argparse.ArgumentParser()
5+
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1")
6+
parser.add_argument("--tensor-parallel", type=int, default=1)
7+
args = parser.parse_args()
8+
9+
mii.serve(args.model, tensor_parallel=args.tensor_parallel)
10+
11+
print(f"Serving model {args.model} on {args.tensor_parallel} GPU(s).")
12+
print(f"Run `python client.py --model {args.model}` to connect.")
13+
print(f"Run `python terminate.py --model {args.model}` to terminate.")

inference/mii/persistent/terminate.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,11 @@
1+
import argparse
12
import mii
23

3-
client = mii.client("mistralai/Mistral-7B-v0.1")
4+
parser = argparse.ArgumentParser()
5+
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1")
6+
args = parser.parse_args()
7+
8+
client = mii.client(args.model)
49
client.terminate_server()
10+
11+
print(f"Terminated server for model {args.model}.")

0 commit comments

Comments
 (0)