-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
106 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Non-Persistent Pipeline Examples | ||
|
||
The `pipeline.py` script can be used to run any of the [supported | ||
models](https://github.com/microsoft/DeepSpeed-mii#supported-models). Provide | ||
the HuggingFace model name, maximum generated tokens, and prompt(s). The | ||
generated responses will be printed in the terminal: | ||
|
||
```shell | ||
$ python pipeline.py --model "mistralai/Mistral-7B-v0.1" --max-new-tokens 128 --prompts "DeepSpeed is" "Seattle is" | ||
``` | ||
|
||
Tensor-parallelism can be controlled using the `deepspeed` launcher and setting | ||
`--num_gpus`: | ||
|
||
```shell | ||
$ deepspeed --num_gpus 2 pipeline.py | ||
``` | ||
|
||
## Model-Specific Examples | ||
|
||
For convenience, we also provide a set of scripts to quickly test the MII | ||
Pipeline with some popular text-generation models: | ||
|
||
| Model | Launch command | | ||
|-------|----------------| | ||
| [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b) | `$ python llama2.py` | | ||
| [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b) | `$ python falcon.py` | | ||
| [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | `$ deepspeed --num_gpus 2 mixtral.py` | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
from mii import pipeline | ||
import mii | ||
|
||
pipe = pipeline("tiiuae/falcon-7b") | ||
pipe = mii.pipeline("tiiuae/falcon-7b") | ||
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True) | ||
if pipe.is_rank_0: | ||
print(responses[0]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
from mii import pipeline | ||
import mii | ||
|
||
pipe = pipeline("meta-llama/Llama-2-7b-hf") | ||
pipe = mii.pipeline("meta-llama/Llama-2-7b-hf") | ||
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True) | ||
if pipe.is_rank_0: | ||
print(responses[0]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
from mii import pipeline | ||
import mii | ||
|
||
pipe = pipeline("mistralai/Mixtral-8x7B-v0.1") | ||
pipe = mii.pipeline("mistralai/Mixtral-8x7B-v0.1") | ||
responses = pipe("DeepSpeed is", max_new_tokens=128, return_full_text=True) | ||
if pipe.is_rank_0: | ||
print(responses[0]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Persistent Deployment Examples | ||
|
||
The `serve.py` script can be used to create an inference server for any of the | ||
[supported models](https://github.com/microsoft/DeepSpeed-mii#supported-models). | ||
Provide the HuggingFace model name and tensor-parallelism (use the default | ||
values and run `$ python serve.py` for a single-GPU | ||
[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | ||
deployment): | ||
|
||
```shell | ||
$ python serve.py --model "mistralai/Mistral-7B-v0.1" tensor-parallel 1 | ||
``` | ||
|
||
Connect to the persistent deployment and generate text with `client.py`. Provide | ||
the HuggingFace model name, maximum generated tokens, and prompt(s) (or if you | ||
are using the default values, run `$ python client.py`): | ||
|
||
```shell | ||
$ python client.py --model "mistralai/Mistral-7B-v0.1" --max-new-tokens 128 --prompts "DeepSpeed is" "Seattle is" | ||
``` | ||
|
||
Shutdown the persistent deployment with `terminate.py`. Provide the HuggingFace | ||
model name (or if you are using the default values, run `$ python | ||
terminate.py`): | ||
|
||
```shell | ||
$ python terminate.py --model "mistralai/Mistral-7B-v0.1 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,18 @@ | ||
import argparse | ||
import mii | ||
|
||
client = mii.client("mistralai/Mistral-7B-v0.1") | ||
output = client.generate("Deepspeed is", max_new_tokens=128) | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1") | ||
parser.add_argument( | ||
"--prompts", type=str, nargs="+", default=["DeepSpeed is", "Seattle is"] | ||
) | ||
parser.add_argument("--max-new-tokens", type=int, default=128) | ||
args = parser.parse_args() | ||
|
||
print(output) | ||
client = mii.client(args.model) | ||
responses = client( | ||
args.prompts, max_new_tokens=args.max_new_tokens, return_full_text=True | ||
) | ||
|
||
for r in responses: | ||
print(r, "\n", "-" * 80, "\n") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,13 @@ | ||
import argparse | ||
import mii | ||
|
||
mii.serve("mistralai/Mistral-7B-v0.1") | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1") | ||
parser.add_argument("--tensor-parallel", type=int, default=1) | ||
args = parser.parse_args() | ||
|
||
mii.serve(args.model, tensor_parallel=args.tensor_parallel) | ||
|
||
print(f"Serving model {args.model} on {args.tensor_parallel} GPU(s).") | ||
print(f"Run `python client.py --model {args.model}` to connect.") | ||
print(f"Run `python terminate.py --model {args.model}` to terminate.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,11 @@ | ||
import argparse | ||
import mii | ||
|
||
client = mii.client("mistralai/Mistral-7B-v0.1") | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--model", type=str, default="mistralai/Mistral-7B-v0.1") | ||
args = parser.parse_args() | ||
|
||
client = mii.client(args.model) | ||
client.terminate_server() | ||
|
||
print(f"Terminated server for model {args.model}.") |