Skip to content

Commit 00dd209

Browse files
committed
update doc
1 parent 1304934 commit 00dd209

File tree

3 files changed

+62
-4
lines changed

3 files changed

+62
-4
lines changed

assets/chat_archi.png

756 KB
Loading

rag_system/README.md

Lines changed: 61 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,63 @@
1-
# TODO
1+
# Chat Sufficiency
2+
3+
Folder org:
4+
```
5+
- old: previous work based on Kotaemon
6+
- backend: python backend using FastAPI and implementing RAG
7+
- frontend: typescript frontend using SvelteKit
8+
```
9+
10+
## Quick start (dev setup)
11+
```
12+
# start backend
13+
cd backend
14+
cp .env.example .env
15+
# fill in correct values in .env
16+
uv run uvicorn app.main:app --reload
17+
18+
# start frontend (dev)
19+
cd frontend
20+
npm install
21+
cp .env.example .env
22+
npm run dev
23+
```
24+
25+
## Pipeline and architecture
26+
Here is the pipeline and main architectural elements:
27+
1. The **SvelteKit frontend** sends a query to the **FastAPI backend** via a POST request.
28+
2. The backend calls a **generative AI API (Scaleway)** to determine whether the query is on-topic. If yes, it rewrites it for retrieval. If not, it answers directly.
29+
3. The rewritten query is embedded on the server's CPU with a small model using **sentence-transformers**. It would be more cost-efficient to use an API and a smaller instance, but the embedding model we started with, **Qwen3-embedding-0.6B** isn't available on any commercial API.
30+
4. The server sends the query to **Qdrant** (vector db), that returns the top $k_{vector}$ matches (configurable).
31+
5. It then reranks the results locally using **flashrank**. Again, a more mature version might use e.g. Cohere's API.
32+
6. The top $k_{rerank}$ chunks are then used to build the context. If the FETCH_PUBS env var is true (default), we use the OpenAlex ID of the chunks to fetch the corresponding publications from **OpenAlex's API**. We build the context using the title, abstract, and the retrieved chunks.
33+
7. The context is passed along with the original query to the generative API and the backend streams back the response the the frontend.
34+
8. The backend saves the messages and intermediary results to **Postgres**.
35+
36+
SvelteKit is a full-stack framework for Svelte, similar to Next for React or Nuxt for Vue. We could almost have used it as a static site generator, but we use server-side functions to hide the backend's URL from the user.
37+
38+
The schema below is an illustration of the aformentioned pipeline. Note that the policy analysis retrieval isn't implemented yet.
39+
40+
![Chat sufficiency architecture schema](../assets/chat_archi.png)
41+
42+
## CleverCloud deployment
43+
Both applications (front and back) are deployed to CleverCloud on the World Sufficiency Lab organization. CC handles the continuous deployment at each push to the given branch.
44+
45+
The frontend requires a larger instance to build than to run, so we configured one. CC also doesn't include build step by default, so it needs to be configured via env vars.
46+
47+
Due to the local computations, the backend currently demands 4 GB of RAM. CleverCloud also doesn't handle uv well, so we need to go through pip. Use :
48+
```
49+
uv pip compile pyproject.toml --output-file requirements.txt
50+
```
51+
to create `requirements.txt` file from `pyproject.toml`. To avoid installing useless GPU-related libraries, pyproject.toml is configured to install torch+cpu. For this to work, don't forget to add this line at the start of `requirements.txt` after generating it :
52+
```
53+
--extra-index-url https://download.pytorch.org/whl/cpu
54+
```
55+
56+
57+
## TODO
58+
- suggestions (of questions)
59+
- prompt: answer in same language as query
60+
- prompt: don't forget social floor
261
- hybrid search
3-
- logging (to postgres ?)
4-
- error handling
562
- policy analysis
6-
- suggestions (of questions)
63+
- optimize cost (use APIs for query embedding and reranking to use a smaller server instance)

rag_system/frontend/.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CHAT_SUFFICIENCY_API_URL=http://localhost:8000

0 commit comments

Comments
 (0)