You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An end-to-end guide for scaling and serving LLM application in production.
4
-
5
-
This repo currently contains one such application: a retrieval-augmented generation (RAG)
6
-
app for answering questions about supplied information. By default, the app uses
7
-
the [Ray documentation](https://docs.ray.io/en/master/) as the source of information.
8
-
This app first [indexes](./app/index.py) the documentation in a vector database
9
-
and then uses an LLM to generate responses for questions that got augmented with
10
-
relevant info retrieved from the index.
3
+
An end-to-end guide for scaling and serving LLM application in production. This repo currently contains one such application: a retrieval-augmented generation (RAG) app for answering questions about supplied information.
11
4
12
5
## Setup
13
6
7
+
### API keys
8
+
We'll be using [OpenAI](https://platform.openai.com/docs/models/) to access ChatGPT models like `gpt-3.5-turbo`, `gpt-4`, etc. and [Anyscale Endpoints](https://endpoints.anyscale.com/) to access OSS LLMs like `Llama-2-70b`. Be sure to create your accounts for both and have your credentials ready.
9
+
14
10
### Compute
15
-
- Start a new [Anyscale workspace on staging](https://console.anyscale-staging.com/o/anyscale-internal/workspaces)
16
-
using an [`g3.8xlarge`](https://instances.vantage.sh/aws/ec2/g3.8xlarge) head node on an AWS cloud.
11
+
- Start a new [Anyscale workspace on staging](https://console.anyscale-staging.com/o/anyscale-internal/workspaces) using an [`g3.8xlarge`](https://instances.vantage.sh/aws/ec2/g3.8xlarge) head node (you can also add GPU worker nodes to run the workloads faster).
17
12
- Use the [`default_cluster_env_2.6.2_py39`](https://docs.anyscale.com/reference/base-images/ray-262/py39#ray-2-6-2-py39) cluster environment.
13
+
- Use the `us-east-1` if you'd like to use the artifacts in our shared storage (source docs, vector DB dumps, etc.).
18
14
19
15
### Repository
16
+
```bash
17
+
git clone https://github.com/ray-project/llm-applications.git .# git checkout -b goku origin/goku
18
+
git config --global user.name <GITHUB-USERNAME>
19
+
git config --global user.email <EMAIL-ADDRESS>
20
+
```
20
21
21
-
First, clone this repository.
22
-
22
+
### Data
23
+
Our data is already ready at `/efs/shared_storage/goku/docs.ray.io/en/master/` (on Staging, `us-east-1`) but if you wanted to load it yourself, run this bash command (change `/desired/output/directory`, but make sure it's on the shared storage,
0 commit comments