Skip to content

Commit 428e239

Browse files
authored
cleaned up notebook (#28)
1 parent 500a1f6 commit 428e239

File tree

127 files changed

+110292
-40214
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+110292
-40214
lines changed

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ repos:
1717
rev: v1.4.0
1818
hooks:
1919
- id: detect-secrets
20-
exclude: "notebooks"
20+
exclude: "notebooks|experiments"
2121
- repo: local
2222
hooks:
2323
- id: clean

README.md

+37-110
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,27 @@
11
# LLM Applications
22

3-
An end-to-end guide for scaling and serving LLM application in production.
4-
5-
This repo currently contains one such application: a retrieval-augmented generation (RAG)
6-
app for answering questions about supplied information. By default, the app uses
7-
the [Ray documentation](https://docs.ray.io/en/master/) as the source of information.
8-
This app first [indexes](./app/index.py) the documentation in a vector database
9-
and then uses an LLM to generate responses for questions that got augmented with
10-
relevant info retrieved from the index.
3+
An end-to-end guide for scaling and serving LLM application in production. This repo currently contains one such application: a retrieval-augmented generation (RAG) app for answering questions about supplied information.
114

125
## Setup
136

7+
### API keys
8+
We'll be using [OpenAI](https://platform.openai.com/docs/models/) to access ChatGPT models like `gpt-3.5-turbo`, `gpt-4`, etc. and [Anyscale Endpoints](https://endpoints.anyscale.com/) to access OSS LLMs like `Llama-2-70b`. Be sure to create your accounts for both and have your credentials ready.
9+
1410
### Compute
15-
- Start a new [Anyscale workspace on staging](https://console.anyscale-staging.com/o/anyscale-internal/workspaces)
16-
using an [`g3.8xlarge`](https://instances.vantage.sh/aws/ec2/g3.8xlarge) head node on an AWS cloud.
11+
- Start a new [Anyscale workspace on staging](https://console.anyscale-staging.com/o/anyscale-internal/workspaces) using an [`g3.8xlarge`](https://instances.vantage.sh/aws/ec2/g3.8xlarge) head node (you can also add GPU worker nodes to run the workloads faster).
1712
- Use the [`default_cluster_env_2.6.2_py39`](https://docs.anyscale.com/reference/base-images/ray-262/py39#ray-2-6-2-py39) cluster environment.
13+
- Use the `us-east-1` if you'd like to use the artifacts in our shared storage (source docs, vector DB dumps, etc.).
1814

1915
### Repository
16+
```bash
17+
git clone https://github.com/ray-project/llm-applications.git . # git checkout -b goku origin/goku
18+
git config --global user.name <GITHUB-USERNAME>
19+
git config --global user.email <EMAIL-ADDRESS>
20+
```
2021

21-
First, clone this repository.
22-
22+
### Data
23+
Our data is already ready at `/efs/shared_storage/goku/docs.ray.io/en/master/` (on Staging, `us-east-1`) but if you wanted to load it yourself, run this bash command (change `/desired/output/directory`, but make sure it's on the shared storage,
24+
so that it's accessible to the workers)
2325
```bash
2426
git clone https://github.com/ray-project/llm-applications.git .
2527
```
@@ -30,116 +32,41 @@ Then set up the environment correctly by specifying the values in your `.env` fi
3032
and installing the dependencies:
3133

3234
```bash
33-
cp ./envs/.env_template .envs
34-
source .envs
3535
pip install --user -r requirements.txt
36+
export PYTHONPATH=$PYTHONPATH:$PWD
3637
pre-commit install
3738
pre-commit autoupdate
3839
```
3940

40-
### Data
41-
42-
Our data is already ready at `/efs/shared_storage/pcmoritz/docs.ray.io/en/master/`
43-
(on Staging) but if you wanted to load it yourself, run this bash command:
44-
41+
### Variables
4542
```bash
46-
bash scrape-docs.sh
43+
touch .env
44+
# Add environment variables to .env
45+
OPENAI_API_BASE="https://api.openai.com/v1"
46+
OPENAI_API_KEY="" # https://platform.openai.com/account/api-keys
47+
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
48+
ANYSCALE_API_KEY="" # https://app.endpoints.anyscale.com/credentials
49+
DB_CONNECTION_STRING="dbname=postgres user=postgres host=localhost password=postgres"
50+
source .env
4751
```
4852

49-
### Vector DB
50-
51-
<details>
52-
<summary>Local installation with brew on MacOS</summary>
53+
## Steps
5354

55+
1. Open [rag.ipynb](notebooks/rag.ipynb) to interactively go through all the concepts and run experiments.
56+
2. Use the best configuration (in `serve.py`) from the notebook experiments to serve the LLM.
5457
```bash
55-
brew install postgresql
56-
brew install pgvector
57-
psql -c "CREATE USER postgres WITH SUPERUSER;"
58-
# pragma: allowlist nextline secret
59-
psql -c "ALTER USER postgres with password 'postgres';"
60-
psql -c "CREATE EXTENSION vector;"
61-
psql -f migrations/initial.sql
62-
python app/index.py create-index
58+
python app/main.py
6359
```
64-
</details>
65-
66-
```bash
67-
bash setup-pgvector.sh
68-
sudo -u postgres psql -f migrations/initial.sql
69-
python app/index.py create-index
70-
```
71-
72-
### Query
73-
Just a sample and uses the current index that's been created.
60+
3. Query your service.
7461
```python
7562
import json
76-
from app.query import QueryAgent
77-
query = "What is the default batch size for map_batches?"
78-
system_content = "Your job is to answer a question using the additional context provided."
79-
agent = QueryAgent(
80-
embedding_model="thenlper/gte-base",
81-
llm="meta-llama/Llama-2-7b-chat-hf",
82-
max_context_length=4096,
83-
system_content=system_content,
84-
)
85-
result = agent.get_response(query=query)
86-
print(json.dumps(result, indent=2))
63+
import requests
64+
data = {"query": "What is the default batch size for map_batches?"}
65+
response = requests.post("http://127.0.0.1:8000/query", json=data)
66+
print(response.text)
8767
```
88-
89-
### Experiments
90-
91-
#### Generate responses
92-
93-
```bash
94-
python app/main.py generate-responses \
95-
--system-content "Answer the {query} using the additional {context} provided."
96-
```
97-
98-
#### Evaluate responses
99-
100-
```bash
101-
python app/main.py evaluate-responses \
102-
--system-content """
103-
Your job is to rate the quality of our generated answer {generated_answer}
104-
given a query {query} and a reference answer {reference_answer}.
105-
Your score has to be between 1 and 5.
106-
You must return your response in a line with only the score.
107-
Do not return answers in any other format.
108-
On a separate line provide your reasoning for the score as well.
109-
"""
110-
```
111-
112-
### Dashboard
113-
```bash
114-
export APP_PORT=8501
115-
echo https://$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN
116-
streamlit run dashboard/Home.py
68+
3. Shutdown the service
69+
```python
70+
from ray import serve
71+
serve.shutdown()
11772
```
118-
119-
### TODO
120-
- [x] notebook cleanup
121-
- [x] evaluator (ex. GPT4) response script
122-
- [x] DB dump & load
123-
- [ ] experiments (in order and fixing choices along the way)
124-
- Evaluator
125-
- [ ] GPT-4 best experiment
126-
- [ ] Llama-70b consistency with GPT4
127-
- [ ] OSS vs. Closed (gpt-3.5 vs. llama)
128-
- [ ] w/ and w/out context (value of RAG)
129-
- [ ] # of chunks to use in context
130-
- Does using more resources help/harm?
131-
- 1, 5, 10 will all fit in the smallest context length of 4K)
132-
- [ ] Chunking size/overlap
133-
- related to # of chunks + context length, but we'll treat as independent variable
134-
- [ ] Embedding (top 3 in leaderboard)
135-
- global leaderboard may not be your leaderboard (empirically validate)
136-
- Later
137-
- [ ] Commercial Assistant evaluation
138-
- [ ] Human Assistant evaluation
139-
- [ ] Data sources
140-
- Much later
141-
- [ ] Prompt
142-
- [ ] Prompt-tuning on query
143-
- [ ] Embedding vs. LLM for retrieval
144-
- [ ] Ray Tune to tweak a subset of components
145-
- [ ] CI/CD workflows

app/config.py

+17-40
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,21 @@
1-
import os
21
from pathlib import Path
32

43
# Directories
4+
EFS_DIR = Path("/efs/shared_storage/goku")
55
ROOT_DIR = Path(__file__).parent.parent.absolute()
6-
7-
8-
DB_CONNECTION_STRING = os.environ.get("DB_CONNECTION_STRING")
9-
DOCS_PATH = os.environ.get("DOCS_PATH")
10-
11-
# Credentials
12-
OPENAI_API_BASE = os.environ.get("OPENAI_API_BASE", "https://api.endpoints.anyscale.com/v1")
13-
OPENAI_API_KEY = os.environ.get(
14-
"OPENAI_API_KEY", ""
15-
) # https://app.endpoints.anyscale.com/credentials
16-
17-
# Indexing and model properties
18-
DEVICE = os.environ.get("DEVICE", "cuda")
19-
EMBEDDING_BATCH_SIZE = os.environ.get("EMBEDDING_BATCH_SIZE", 100)
20-
EMBEDDING_ACTORS = os.environ.get("EMBEDDING_ACTORS", 2)
21-
NUM_GPUS = os.environ.get("NUM_GPUS", 1)
22-
INDEXING_ACTORS = os.environ.get("INDEXING_ACTORS", 20)
23-
INDEXING_BATCH_SIZE = os.environ.get("INDEXING_BATCH_SIZE", 128)
24-
25-
# Response generation properties
26-
EXPERIMENT_NAME = os.environ.get("EXPERIMENT_NAME", "llama-2-7b-gtebase")
27-
DATA_PATH = os.environ.get("DATA_PATH", "datasets/eval-dataset-v1.jsonl")
28-
CHUNK_SIZE = os.environ.get("CHUNK_SIZE", 300)
29-
CHUNK_OVERLAP = os.environ.get("CHUNK_OVERLAP", 50)
30-
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "thenlper/gte-base")
31-
LLM = os.environ.get("LLM", "meta-llama/Llama-2-7b-chat-hf")
32-
TEMPERATURE = os.environ.get("TEMPERATURE", 0)
33-
MAX_CONTEXT_LENGTH = os.environ.get("MAX_CONTEXT_LENGTH", 4096)
34-
35-
# Evaluation properties
36-
REFERENCE_LOC = os.environ.get("REFERENCE_LOC", "experiments/responses/gpt-4-with-source.json")
37-
RESPONSE_LOC = os.environ.get("RESPONSE_LOC", "experiments/responses/$EXPERIMENT_NAME.json")
38-
EVALUATOR = os.environ.get("EVALUATOR", "meta-llama/Llama-2-70b-chat-hf")
39-
EVALUATOR_TEMPERATURE = os.environ.get("EVALUATOR_TEMPERATURE", 0)
40-
EVALUATOR_MAX_CONTEXT_LENGTH = os.environ.get("EVALUATOR_MAX_CONTEXT_LENGTH", 4096)
41-
42-
# Slack bot integration
43-
SLACK_APP_TOKEN = os.environ.get("SLACK_APP_TOKEN", "")
44-
SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN", "")
6+
EXPERIMENTS_DIR = Path(ROOT_DIR, "experiments")
7+
8+
# Mappings
9+
EMBEDDING_DIMENSIONS = {
10+
"thenlper/gte-base": 768,
11+
"BAAI/bge-large-en": 1024,
12+
"text-embedding-ada-002": 1536,
13+
}
14+
MAX_CONTEXT_LENGTHS = {
15+
"gpt-4": 8192,
16+
"gpt-3.5-turbo": 4096,
17+
"gpt-3.5-turbo-16k": 16384,
18+
"meta-llama/Llama-2-7b-chat-hf": 4096,
19+
"meta-llama/Llama-2-13b-chat-hf": 4096,
20+
"meta-llama/Llama-2-70b-chat-hf": 4096,
21+
}

0 commit comments

Comments
 (0)