Skip to content

Commit 9fcd803

Browse files
author
pierre.delaunay
committed
Tweaks
1 parent 4d3c06a commit 9fcd803

File tree

3 files changed

+25
-15
lines changed

3 files changed

+25
-15
lines changed

docs/examples/llm/client.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import subprocess
2+
13
import openai
24

35

@@ -34,7 +36,7 @@ def get_job_comment(name="inference_server.sh"):
3436
# profit
3537
completion = openai.Completion.create(
3638
model=server['model'],
37-
prompt=args.prompt
39+
prompt="What is the square root of 25 ?"
3840
)
3941

4042
print(completion)

docs/examples/llm/inference_server.sh

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,22 @@
1616
#SBATCH --ntasks-per-node=1
1717
#SBATCH --mem=32G
1818

19-
usage() {
20-
echo "Usage: $0 [-m] [-p]
19+
function usage() {
20+
echo "Usage: $0 [-m] [-p]"
2121
echo " -h Display this help message."
2222
echo " -m MODEL Specify a file to process."
2323
echo " -p PATH Specify a directory to work in."
24+
echo " -e ENV Specify the conda environementt to use."
2425
echo " ARGUMENT Any additional argument you want to process."
2526
exit 1
2627
}
2728

2829
MODEL=""
29-
PATH=""
30+
MODEL_PATH=""
3031
ENV="./env"
3132

3233

33-
while getopts ":hf:d:" opt; do
34+
while getopts ":hm:p:e:" opt; do
3435
case $opt in
3536
h)
3637
usage
@@ -39,7 +40,7 @@ while getopts ":hf:d:" opt; do
3940
MODEL="$OPTARG"
4041
;;
4142
p)
42-
PATH="$OPTARG"
43+
MODEL_PATH="$OPTARG"
4344
;;
4445
e)
4546
ENV="$OPTARG"
@@ -55,22 +56,25 @@ while getopts ":hf:d:" opt; do
5556
esac
5657
done
5758

59+
echo "model: $MODEL"
60+
echo " path: $MODEL_PATH"
61+
echo " env: $ENV"
5862

5963
export MILA_WEIGHTS="/network/weights/"
60-
6164
cd $SLURM_TMPDIR
6265

6366
#
6467
# Fix problem with conda saying it is not "init properly"
6568
#
6669
CONDA_EXEC="$(which conda)"
6770
CONDA_BASE=$(dirname $CONDA_EXEC)
71+
CONDA_ENVS="$CONDA_BASE/../envs"
6872
source $CONDA_BASE/../etc/profile.d/conda.sh
6973

7074
#
7175
# Create a new environment
7276
#
73-
if [ ! -d "$ENV" ]; then
77+
if [ ! -d "$ENV" ] && [ "$ENV" != "base" ] && [ ! -d "$CONDA_ENVS/$ENV" ]; then
7478
conda create --prefix $ENV python=3.9 -y
7579
fi
7680
conda activate $ENV
@@ -85,12 +89,12 @@ NAME="$WEIGHTS/$MODEL"
8589
#
8690
scontrol update job $SLURM_JOB_ID comment="model=$MODEL|host=$HOST|port=$PORT|shared=y"
8791

88-
#
92+
#
8993
# Launch Server
9094
#
9195
python -m vllm.entrypoints.openai.api_server \
9296
--host $HOST \
9397
--port $PORT \
94-
--model "$MODEL" \
98+
--model "$MODEL_PATH" \
9599
--tensor-parallel-size $SLURM_NTASKS_PER_NODE \
96100
--served-model-name "$MODEL"

docs/examples/llm/vllm.rst

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,24 +36,28 @@ You can override the defaults by specifying arguments to sbatch.
3636
Client
3737
------
3838

39-
Becasue vLLM replicates OpenAI's API, the client side is quite straight forward.
40-
Own OpenAI's client can be reused.
39+
Because vLLM replicates OpenAI's API, the client side is quite straight forward and
40+
own OpenAI's client can be reused.
4141

4242
.. warning::
4343

4444
The server takes a while to setup you might to have to wait a few minutes
4545
before the server is ready for inference.
4646

47-
You can check the job log of the server.
48-
Look for
47+
You can check the job log of the server using ``tail -f slurm-<JOB-ID>.out`` to
48+
see the log as it is written.
49+
50+
Look for ``Uvicorn running on http://... (Press CTRL+C to quit)``
51+
to know when the server is ready to receive requests.
4952

5053

5154
.. note::
5255

53-
We use squeue to look for the inference server job to configure the
56+
We use ``squeue`` to look for the inference server job to configure the
5457
url endpoint automatically.
5558

5659
Make sure your job name is unique!
5760

61+
5862
.. literalinclude:: client.py
5963
:language: python

0 commit comments

Comments
 (0)