Skip to content

Commit 95812ae

Browse files
committed
Update README.md
1 parent 4471612 commit 95812ae

1 file changed

Lines changed: 191 additions & 75 deletions

File tree

README.md

Lines changed: 191 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -4,118 +4,151 @@
44
[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-3776AB.svg)](https://www.python.org/)
55
[![License: PolyForm Noncommercial](https://img.shields.io/badge/license-PolyForm%20Noncommercial%201.0.0-5B4B8A.svg)](LICENSE)
66

7-
Pathology RAG Workbench is a local-first retrieval and question-answering workspace for pathology PDFs. It combines PDF ingestion, chunking, pgvector-backed retrieval, a FastAPI service, a browser UI, and optional OpenClaw integration so you can build a citation-backed pathology assistant around your own licensed corpus.
7+
Pathology RAG Workbench is a local-first pathology PDF workspace. It helps you turn your own licensed pathology books or notes into a citation-backed retrieval system with a browser UI, a local API, and a CLI.
88

9-
This repository is public, but it is not open source in the OSI sense. The code is released under the PolyForm Noncommercial 1.0.0 license, so commercial use is not allowed.
9+
This repository is public, but it is not open source in the OSI sense. It is released under the [PolyForm Noncommercial 1.0.0](LICENSE) license, so commercial use is not allowed.
1010

11-
## Important Notices
11+
## Start Here
1212

13-
- Non-commercial use only. Read [LICENSE](LICENSE) before using, modifying, or redistributing the software.
14-
- No corpus is bundled. You must add your own lawfully obtained PDFs.
15-
- Do not commit copyrighted textbooks, extracted chunks, page previews, or PHI to Git.
16-
- This project is a research and workflow tool, not a medical device and not a substitute for expert pathology review.
13+
If you only read one section, read this one.
1714

18-
## What It Does
15+
Clone the repo and run:
1916

20-
- Builds a local manifest and chunk index from pathology PDFs.
21-
- Stores embeddings in PostgreSQL with `pgvector`.
22-
- Exposes `health`, `library`, `search`, `ask`, and browser UI routes through FastAPI.
23-
- Applies evidence guardrails so `/ask` refuses to answer when textbook grounding is missing or weak.
24-
- Supports filtering answers to selected documents.
25-
- Includes OpenClaw-facing docs and a local CLI client.
17+
```bash
18+
git clone https://github.com/hutaobo/pathology-rag-workbench.git
19+
cd pathology-rag-workbench
20+
bash scripts/deploy_portable.sh --allow-no-openai
21+
```
2622

27-
## Stack
23+
Then open:
2824

29-
- Python 3.11+
30-
- FastAPI
31-
- PostgreSQL + `pgvector`
32-
- Docker Compose
33-
- PyMuPDF / pdfplumber / pypdf
34-
- OpenAI API for embeddings and model-backed answers when enabled
25+
```text
26+
http://127.0.0.1:8000/ui
27+
```
28+
29+
What you should have after that:
30+
31+
- a working browser UI
32+
- a local FastAPI server
33+
- PostgreSQL with `pgvector`
34+
- a safe empty-library state if you have not added PDFs yet
35+
- local CLI commands such as `pathology-client health`
36+
37+
This path is designed to work even when:
38+
39+
- you have no pathology PDFs yet
40+
- you do not have an OpenAI API key yet
41+
- you are cloning the repo onto a new machine for the first time
3542

36-
## Quick Start
43+
## Five-Minute Onboarding
3744

38-
The fastest setup path uses the provided Bash scripts. On Windows, use Git Bash or WSL.
45+
### 1. Prerequisites
3946

40-
1. Install prerequisites:
47+
You need:
48+
49+
- Python 3.11+
50+
- Docker
51+
- Docker Compose
52+
53+
Quick check:
4154

4255
```bash
4356
python3 --version
4457
docker --version
4558
docker compose version
4659
```
4760

48-
2. Copy environment defaults:
61+
On macOS, if Docker is installed but the daemon is not running, the deploy script will try to use Colima when available.
62+
63+
### 2. Run the one-command setup
4964

5065
```bash
51-
cp .env.example .env
66+
bash scripts/deploy_portable.sh --allow-no-openai
5267
```
5368

54-
3. Add your optional OpenAI key to `.env` if you want embeddings and model-backed `/ask`.
69+
What this script does:
5570

56-
4. Put your own PDFs under `pathologybook/` or plan to upload them through the browser UI later.
71+
- creates `.env` from `.env.example` if needed
72+
- bootstraps `.venv`
73+
- starts PostgreSQL and the API container
74+
- builds a local empty or populated library index
75+
- installs the local `pathology-client` command
76+
- installs optional OpenClaw integration
77+
- opens the browser UI unless you pass `--skip-browser`
5778

58-
5. Run the portable bootstrap:
79+
### 3. Verify that it worked
5980

60-
```bash
61-
bash scripts/deploy_portable.sh --allow-no-openai
62-
```
63-
64-
6. Open the browser UI:
81+
Browser:
6582

6683
```text
6784
http://127.0.0.1:8000/ui
6885
```
6986

70-
Without an OpenAI key, the project still supports library management and retrieval previews, but model-backed answer synthesis is skipped.
71-
72-
## Manual Setup
73-
74-
If you prefer to set the project up step by step:
75-
76-
1. Bootstrap the virtualenv:
87+
CLI:
7788

7889
```bash
79-
bash scripts/bootstrap.sh
90+
pathology-client health
8091
```
8192

82-
2. Start PostgreSQL and the API:
93+
Expected result:
8394

84-
```bash
85-
docker compose up -d postgres api
95+
```json
96+
{
97+
"ok": "true",
98+
"status": "live"
99+
}
86100
```
87101

88-
3. Build or refresh the local library index:
102+
## What You Can Do Immediately
103+
104+
Even without an OpenAI key, you can:
105+
106+
- start the full local UI
107+
- upload PDFs
108+
- sync the local library
109+
- inspect library state
110+
- run retrieval previews instead of model-synthesized answers
111+
112+
Useful commands:
89113

90114
```bash
91-
.venv/bin/python -m apps.ingest_worker.cli sync-library
115+
pathology-client health
116+
pathology-client ui
117+
pathology-client library
118+
pathology-client search --query "ductal carcinoma in situ"
119+
pathology-client sync-library
92120
```
93121

94-
On Windows PowerShell, the equivalent is:
122+
## Add Your Own PDFs
95123

96-
```powershell
97-
.venv\Scripts\python.exe -m apps.ingest_worker.cli sync-library
98-
```
124+
No corpus is bundled with this repository.
125+
126+
You must add your own lawfully obtained PDFs by one of these methods:
99127

100-
4. Check health:
128+
### Option A: Copy PDFs into `pathologybook/`
101129

102130
```bash
103-
.venv/bin/python -m apps.pathology_client.cli health
131+
cp /path/to/your/*.pdf pathologybook/
132+
pathology-client sync-library
104133
```
105134

106-
## Adding Your Own PDFs
135+
### Option B: Upload from the browser UI
107136

108-
The repository defaults to a local untracked library config at `config/library.local.yaml`. You can either:
137+
Open:
109138

110-
- Copy PDFs into `pathologybook/` and run `sync-library`.
111-
- Upload PDFs from the browser UI.
112-
- Create a local config file manually from the example:
139+
```text
140+
http://127.0.0.1:8000/ui
141+
```
142+
143+
Then use the library panel to upload PDFs and trigger a sync.
144+
145+
### Option C: Use a local config file
113146

114147
```bash
115148
cp config/library.example.yaml config/library.local.yaml
116149
```
117150

118-
Example local config:
151+
Example:
119152

120153
```yaml
121154
library_name: my-pathology-library
@@ -127,30 +160,112 @@ documents:
127160
path: GU_Review.pdf
128161
```
129162
130-
## Common Commands
163+
## Enable Model-Backed Answers
164+
165+
OpenAI-backed embeddings and `/ask` are optional.
166+
167+
If you do nothing, the project still works in retrieval-preview mode.
168+
169+
If you want model-backed answers:
170+
171+
1. Set `OPENAI_API_KEY`
172+
2. restart the deployment or rerun the sync
173+
174+
Simplest path:
175+
176+
```bash
177+
export OPENAI_API_KEY=your_key_here
178+
bash scripts/deploy_portable.sh
179+
```
180+
181+
If you already deployed without a key:
182+
183+
```bash
184+
export OPENAI_API_KEY=your_key_here
185+
bash scripts/deploy_portable.sh --skip-browser
186+
```
187+
188+
After that you can ask:
131189

132190
```bash
133-
pathology-client health
134-
pathology-client library
135-
pathology-client search --query "ductal carcinoma in situ"
136191
pathology-client ask --question "What are the key features of mucinous carcinoma of the prostate?"
137-
pathology-client sync-library
138192
```
139193

140-
## Docker Services
194+
## Common Commands
141195

142-
The active runtime stack is intentionally small:
196+
```bash
197+
make doctor
198+
make test
199+
make ui
200+
make portable-up
201+
make portable-bundle
202+
```
143203

144-
- `postgres`: PostgreSQL with `pgvector`
145-
- `api`: FastAPI application
146-
- `ingest`: one-shot ingestion container when explicitly invoked
204+
Direct script entrypoints:
147205

148-
Start the default stack:
206+
```bash
207+
bash scripts/doctor.sh
208+
bash scripts/bootstrap.sh
209+
bash scripts/deploy_portable.sh --allow-no-openai
210+
bash scripts/package_portable.sh
211+
```
212+
213+
## Troubleshooting
214+
215+
### Check your environment
216+
217+
```bash
218+
bash scripts/doctor.sh
219+
```
220+
221+
### The browser UI does not open
222+
223+
Open it manually:
224+
225+
```text
226+
http://127.0.0.1:8000/ui
227+
```
228+
229+
### `pathology-client` is not found
230+
231+
Reinstall the local wrappers:
149232

150233
```bash
151-
docker compose up --build postgres api
234+
bash scripts/install_openclaw_integration.sh
152235
```
153236

237+
### No answers are generated
238+
239+
That is expected when:
240+
241+
- your library is empty
242+
- your retrieval hits are too weak
243+
- `OPENAI_API_KEY` is not configured
244+
245+
In that case:
246+
247+
- upload or copy more PDFs
248+
- run `pathology-client sync-library`
249+
- verify `OPENAI_API_KEY` if you want synthesized answers
250+
251+
## What This Repository Contains
252+
253+
- local PDF ingestion and chunking
254+
- `pgvector`-backed retrieval
255+
- a browser UI
256+
- a local CLI client
257+
- upload and sync flows for pathology PDFs
258+
- evidence guardrails for ungrounded medical answers
259+
- optional OpenClaw integration
260+
261+
It does not contain:
262+
263+
- pathology textbooks
264+
- patient data
265+
- PHI
266+
- a clinical-grade decision engine
267+
- commercial-use rights
268+
154269
## Project Layout
155270

156271
```text
@@ -172,13 +287,14 @@ tests/ Unit and API tests
172287
## Safety and Intended Use
173288

174289
- Review all outputs before using them in education, reporting, or research.
175-
- Treat citations as aids, not as a final authority.
176-
- Keep patient data and licensed materials out of version control and public issue trackers.
290+
- Treat citations as aids, not as final authority.
291+
- Do not use this as a substitute for expert pathology review.
292+
- Do not commit copyrighted textbooks, extracted page text, page previews, embeddings, PHI, or secrets.
177293

178294
## Contributing
179295

180-
See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines and repo hygiene rules.
296+
See [CONTRIBUTING.md](CONTRIBUTING.md).
181297

182298
## Security
183299

184-
See [SECURITY.md](SECURITY.md) for reporting guidance and data-handling expectations.
300+
See [SECURITY.md](SECURITY.md).

0 commit comments

Comments
 (0)