Skip to content

Commit 419ba8e

Browse files
authored
[serve] Integrate and Document Bring-Your-Own Gradio Applications (#2… (ray-project#27560)
1 parent 19e5599 commit 419ba8e

File tree

14 files changed

+416
-368
lines changed

14 files changed

+416
-368
lines changed

.buildkite/pipeline.yml

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,26 @@
251251
> test_shard.txt
252252
- cat test_shard.txt
253253
- bazel test --config=ci $(./ci/run/bazel_export_options)
254-
--test_tag_filters=-post_wheel_build
254+
--test_tag_filters=-post_wheel_build,-py37
255255
$(cat test_shard.txt)
256-
256+
- label: ":serverless: Serve Tests (Python 3.7)"
257+
conditions:
258+
[
259+
"RAY_CI_SERVE_AFFECTED",
260+
"RAY_CI_PYTHON_AFFECTED",
261+
]
262+
commands:
263+
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
264+
- echo "--- Setting up Python 3.7 environment."
265+
- PYTHON=3.7 TORCH_VERSION=1.6 ./ci/env/install-dependencies.sh
266+
# Specifying PYTHON=3.7 above somehow messes up the Ray install.
267+
# Uninstall and re-install Ray so that we can use Ray Client.
268+
# (Remove thirdparty_files to sidestep an issue with psutil.)
269+
- pip uninstall -y ray && rm -rf /ray/python/ray/thirdparty_files
270+
- ./ci/ci.sh build
271+
- bazel test --config=ci $(./ci/run/bazel_export_options)
272+
--test_tag_filters=team:serve
273+
python/ray/serve/test_gradio
257274

258275
- label: ":python: Minimal install 3.6"
259276
conditions: ["RAY_CI_PYTHON_AFFECTED"]
@@ -288,6 +305,17 @@
288305
- bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
289306
python/ray/dashboard/test_dashboard
290307

308+
- label: ":python: Ray Serve default install"
309+
conditions: ["RAY_CI_PYTHON_AFFECTED"]
310+
commands:
311+
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
312+
- ./ci/env/install-serve.sh
313+
- ./ci/env/env_info.sh
314+
- bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
315+
python/ray/serve/test_deployment_graph
316+
- bazel test --test_output=streamed --config=ci --test_env=RAY_DEFAULT=1 $(./ci/run/bazel_export_options)
317+
python/ray/serve/test_api
318+
291319
- label: ":python: Release test package unit tests"
292320
conditions: ["ALWAYS"]
293321
commands:

ci/ci.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ test_python() {
172172
-python/ray/serve:test_cross_language # Ray java not built on Windows yet.
173173
-python/ray/serve:test_gcs_failure # Fork not supported in windows
174174
-python/ray/serve:test_standalone2 # Multinode not supported on Windows
175+
-python/ray/serve:test_gradio
175176
-python/ray/tests:test_actor_advanced # crashes in shutdown
176177
-python/ray/tests:test_autoscaler # We don't support Autoscaler on Windows
177178
-python/ray/tests:test_autoscaler_aws
@@ -216,6 +217,7 @@ test_python() {
216217
--test_env=CI="1" \
217218
--test_env=RAY_CI_POST_WHEEL_TESTS="1" \
218219
--test_env=USERPROFILE="${USERPROFILE}" \
220+
--test_env=WINDIR \
219221
--test_output=streamed \
220222
-- \
221223
${test_shard_selection};

ci/env/install-serve.sh

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/usr/bin/env bash
2+
3+
# Installs serve dependencies ("ray[serve]") on top of minimal install
4+
5+
# Get script's directory: https://stackoverflow.com/a/246128
6+
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
7+
8+
# Installs minimal dependencies
9+
"$SCRIPT_DIR"/install-minimal.sh
10+
11+
# Installs serve dependencies
12+
python -m pip install -U "ray[serve]"
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Scaling your Gradio app with Ray Serve
2+
3+
In this guide, we will show you how to scale up your [Gradio](https://gradio.app/) application using Ray Serve. There is no need to change the internal architecture of your Gradio app; instead, we will neatly wrap it with Ray Serve and then scale it up to access more resources.
4+
5+
## Dependencies
6+
7+
To follow this tutorial, you will need Ray Serve and Gradio. If you haven't already, install them by running:
8+
```console
9+
$ pip install "ray[serve]"
10+
$ pip install gradio
11+
```
12+
For the purposes of this tutorial, we will be working with Gradio apps that run text summarization and text generation models. **Note that you can substitute this Gradio app for any Gradio app of your own!**
13+
14+
We will be using [HuggingFace's Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) to access the model. First, let's install the transformers module.
15+
```console
16+
$ pip install transformers
17+
```
18+
19+
## Quickstart: Deploy your Gradio app with Ray Serve
20+
21+
This example will show you an easy, straightforward way to deploy your app onto Ray Serve. Start by creating a new Python file named `demo.py` and import `GradioServer` from Ray Serve for deploying your Gradio app, `gradio`, and `transformers.pipeline` for loading text summarization models.
22+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
23+
:start-after: __doc_import_begin__
24+
:end-before: __doc_import_end__
25+
```
26+
27+
Then, we construct the (optional) Gradio app `io`:
28+
:::{note}
29+
Remember you can substitute this with your own Gradio app if you want to try scaling up your own Gradio app!
30+
:::
31+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
32+
:start-after: __doc_gradio_app_begin__
33+
:end-before: __doc_gradio_app_end__
34+
```
35+
36+
37+
### Understanding `GradioServer`
38+
In order to deploy your Gradio app onto Ray Serve, you need to wrap your Gradio app in a Serve [deployment](serve-key-concepts-deployment). `GradioServer` acts as that wrapper. It serves your Gradio app remotely on Ray Serve so that it can process and respond to HTTP requests.
39+
:::{note}
40+
`GradioServer` is simply `GradioIngress` but wrapped in a Serve deployment.
41+
:::
42+
```{literalinclude} ../../../../python/ray/serve/gradio_integrations.py
43+
:start-after: __doc_gradio_ingress_begin__
44+
:end-before: __doc_gradio_ingress_end__
45+
```
46+
47+
### Deploy your Gradio Server
48+
Replicas in a deployment are copies of your program living on Ray Serve, and more replicas means your deployment can serve more client requests. You can increase the number of replicas of your application or increase the number of CPUs and/or GPUs available to each replica.
49+
50+
Then, using either the example we created above, or an existing Gradio app (of type `Interface`, `Block`, `Parallel`, etc.), wrap it in your Gradio Server.
51+
52+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration.py
53+
:start-after: __doc_app_begin__
54+
:end-before: __doc_app_end__
55+
```
56+
57+
Finally, deploy your Gradio Server! Run the following in your terminal:
58+
```console
59+
$ serve run demo:app
60+
```
61+
62+
Now you can access your Gradio app at `http://localhost:8000`! This is what it should look like:
63+
![Gradio Result](https://raw.githubusercontent.com/ray-project/images/master/docs/serve/gradio_result.png)
64+
65+
See [Putting Ray Serve Deployment Graphs in Production](https://docs.ray.io/en/master/serve/production.html#id1) for more information on how to deploy your app in production.
66+
67+
68+
## Parallelizing models with Ray Serve
69+
You can run multiple models in parallel with Ray Serve by utilizing the [deployment graph](deployment-graph-e2e-tutorial) in Ray Serve.
70+
71+
### Original Approach
72+
Suppose you want to run the following program.
73+
74+
1. Take two text generation models, [`gpt2`](https://huggingface.co/gpt2) and [`EleutherAI/gpt-neo-125M`](https://huggingface.co/EleutherAI/gpt-neo-125M).
75+
2. Run the two models on the same input text, such that the generated text has a minimum length of 20 and maximum length of 100.
76+
3. Display the outputs of both models using Gradio.
77+
78+
This is how you would do it normally:
79+
80+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-original.py
81+
:start-after: __doc_code_begin__
82+
:end-before: __doc_code_end__
83+
```
84+
85+
### Parallelize using Ray Serve
86+
87+
With Ray Serve, we can parallelize the two text generation models by wrapping each model in a separate Ray Serve [deployment](serve-key-concepts-deployment). Deployments are defined by decorating a Python class or function with `@serve.deployment`, and usually wrap the models that you want to deploy on Ray Serve and handle incoming requests.
88+
89+
First, let's import our dependencies. Note that we need to import `GradioIngress` instead of `GradioServer` like before since we're now building a customized `MyGradioServer` that can run models in parallel.
90+
91+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
92+
:start-after: __doc_import_begin__
93+
:end-before: __doc_import_end__
94+
```
95+
96+
Then, let's wrap our `gpt2` and `EleutherAI/gpt-neo-125M` models in Serve deployments, named `TextGenerationModel`.
97+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
98+
:start-after: __doc_models_begin__
99+
:end-before: __doc_models_end__
100+
```
101+
102+
Next, instead of simply wrapping our Gradio app in a `GradioServer` deployment, we can build our own `MyGradioServer` that reroutes the Gradio app so that it runs the `TextGenerationModel` deployments:
103+
104+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
105+
:start-after: __doc_gradio_server_begin__
106+
:end-before: __doc_gradio_server_end__
107+
```
108+
109+
Lastly, we link everything together:
110+
```{literalinclude} ../../../../python/ray/serve/examples/doc/gradio-integration-parallel.py
111+
:start-after: __doc_app_begin__
112+
:end-before: __doc_app_end__
113+
```
114+
115+
:::{note}
116+
This will bind your two text generation models (wrapped in Serve deployments) to `MyGradioServer._d1` and `MyGradioServer._d2`, forming a [deployment graph](deployment-graph-e2e-tutorial). Thus, we have built our Gradio Interface `io` such that it calls `MyGradioServer.fanout()`, which simply sends requests to your two text generation models that are deployed on Ray Serve.
117+
:::
118+
119+
Now, you can run your scalable app, and the two text generation models will run in parallel on Ray Serve! Run your Gradio app:
120+
121+
```console
122+
$ serve run demo:app
123+
```
124+
125+
Access your Gradio app at http://localhost:8000. This is what it should look like:
126+
![Gradio Result](https://raw.githubusercontent.com/ray-project/images/master/docs/serve/gradio_result_parallel.png)
127+
128+
See [Putting Ray Serve Deployment Graphs in Production](https://docs.ray.io/en/master/serve/production.html#id1) for more information on how to deploy your app in production.

0 commit comments

Comments
 (0)