44[ ![ Python 3.11+] ( https://img.shields.io/badge/python-3.11%2B-3776AB.svg )] ( https://www.python.org/ )
55[ ![ License: PolyForm Noncommercial] ( https://img.shields.io/badge/license-PolyForm%20Noncommercial%201.0.0-5B4B8A.svg )] ( LICENSE )
66
7- Pathology RAG Workbench is a local-first retrieval and question-answering workspace for pathology PDFs. It combines PDF ingestion, chunking, pgvector -backed retrieval, a FastAPI service, a browser UI, and optional OpenClaw integration so you can build a citation-backed pathology assistant around your own licensed corpus .
7+ Pathology RAG Workbench is a local-first pathology PDF workspace. It helps you turn your own licensed pathology books or notes into a citation -backed retrieval system with a browser UI, a local API, and a CLI .
88
9- This repository is public, but it is not open source in the OSI sense. The code is released under the PolyForm Noncommercial 1.0.0 license, so commercial use is not allowed.
9+ This repository is public, but it is not open source in the OSI sense. It is released under the [ PolyForm Noncommercial 1.0.0] ( LICENSE ) license, so commercial use is not allowed.
1010
11- ## Important Notices
11+ ## Start Here
1212
13- - Non-commercial use only. Read [ LICENSE] ( LICENSE ) before using, modifying, or redistributing the software.
14- - No corpus is bundled. You must add your own lawfully obtained PDFs.
15- - Do not commit copyrighted textbooks, extracted chunks, page previews, or PHI to Git.
16- - This project is a research and workflow tool, not a medical device and not a substitute for expert pathology review.
13+ If you only read one section, read this one.
1714
18- ## What It Does
15+ Clone the repo and run:
1916
20- - Builds a local manifest and chunk index from pathology PDFs.
21- - Stores embeddings in PostgreSQL with ` pgvector ` .
22- - Exposes ` health ` , ` library ` , ` search ` , ` ask ` , and browser UI routes through FastAPI.
23- - Applies evidence guardrails so ` /ask ` refuses to answer when textbook grounding is missing or weak.
24- - Supports filtering answers to selected documents.
25- - Includes OpenClaw-facing docs and a local CLI client.
17+ ``` bash
18+ git clone https://github.com/hutaobo/pathology-rag-workbench.git
19+ cd pathology-rag-workbench
20+ bash scripts/deploy_portable.sh --allow-no-openai
21+ ```
2622
27- ## Stack
23+ Then open:
2824
29- - Python 3.11+
30- - FastAPI
31- - PostgreSQL + ` pgvector `
32- - Docker Compose
33- - PyMuPDF / pdfplumber / pypdf
34- - OpenAI API for embeddings and model-backed answers when enabled
25+ ``` text
26+ http://127.0.0.1:8000/ui
27+ ```
28+
29+ What you should have after that:
30+
31+ - a working browser UI
32+ - a local FastAPI server
33+ - PostgreSQL with ` pgvector `
34+ - a safe empty-library state if you have not added PDFs yet
35+ - local CLI commands such as ` pathology-client health `
36+
37+ This path is designed to work even when:
38+
39+ - you have no pathology PDFs yet
40+ - you do not have an OpenAI API key yet
41+ - you are cloning the repo onto a new machine for the first time
3542
36- ## Quick Start
43+ ## Five-Minute Onboarding
3744
38- The fastest setup path uses the provided Bash scripts. On Windows, use Git Bash or WSL.
45+ ### 1. Prerequisites
3946
40- 1 . Install prerequisites:
47+ You need:
48+
49+ - Python 3.11+
50+ - Docker
51+ - Docker Compose
52+
53+ Quick check:
4154
4255``` bash
4356python3 --version
4457docker --version
4558docker compose version
4659```
4760
48- 2 . Copy environment defaults:
61+ On macOS, if Docker is installed but the daemon is not running, the deploy script will try to use Colima when available.
62+
63+ ### 2. Run the one-command setup
4964
5065``` bash
51- cp .env.example .env
66+ bash scripts/deploy_portable.sh --allow-no-openai
5267```
5368
54- 3 . Add your optional OpenAI key to ` .env ` if you want embeddings and model-backed ` /ask ` .
69+ What this script does:
5570
56- 4 . Put your own PDFs under ` pathologybook/ ` or plan to upload them through the browser UI later.
71+ - creates ` .env ` from ` .env.example ` if needed
72+ - bootstraps ` .venv `
73+ - starts PostgreSQL and the API container
74+ - builds a local empty or populated library index
75+ - installs the local ` pathology-client ` command
76+ - installs optional OpenClaw integration
77+ - opens the browser UI unless you pass ` --skip-browser `
5778
58- 5 . Run the portable bootstrap:
79+ ### 3. Verify that it worked
5980
60- ``` bash
61- bash scripts/deploy_portable.sh --allow-no-openai
62- ```
63-
64- 6 . Open the browser UI:
81+ Browser:
6582
6683``` text
6784http://127.0.0.1:8000/ui
6885```
6986
70- Without an OpenAI key, the project still supports library management and retrieval previews, but model-backed answer synthesis is skipped.
71-
72- ## Manual Setup
73-
74- If you prefer to set the project up step by step:
75-
76- 1 . Bootstrap the virtualenv:
87+ CLI:
7788
7889``` bash
79- bash scripts/bootstrap.sh
90+ pathology-client health
8091```
8192
82- 2 . Start PostgreSQL and the API :
93+ Expected result :
8394
84- ``` bash
85- docker compose up -d postgres api
95+ ``` json
96+ {
97+ "ok" : " true" ,
98+ "status" : " live"
99+ }
86100```
87101
88- 3 . Build or refresh the local library index:
102+ ## What You Can Do Immediately
103+
104+ Even without an OpenAI key, you can:
105+
106+ - start the full local UI
107+ - upload PDFs
108+ - sync the local library
109+ - inspect library state
110+ - run retrieval previews instead of model-synthesized answers
111+
112+ Useful commands:
89113
90114``` bash
91- .venv/bin/python -m apps.ingest_worker.cli sync-library
115+ pathology-client health
116+ pathology-client ui
117+ pathology-client library
118+ pathology-client search --query " ductal carcinoma in situ"
119+ pathology-client sync-library
92120```
93121
94- On Windows PowerShell, the equivalent is:
122+ ## Add Your Own PDFs
95123
96- ``` powershell
97- .venv\Scripts\python.exe -m apps.ingest_worker.cli sync-library
98- ```
124+ No corpus is bundled with this repository.
125+
126+ You must add your own lawfully obtained PDFs by one of these methods:
99127
100- 4 . Check health:
128+ ### Option A: Copy PDFs into ` pathologybook/ `
101129
102130``` bash
103- .venv/bin/python -m apps.pathology_client.cli health
131+ cp /path/to/your/* .pdf pathologybook/
132+ pathology-client sync-library
104133```
105134
106- ## Adding Your Own PDFs
135+ ### Option B: Upload from the browser UI
107136
108- The repository defaults to a local untracked library config at ` config/library.local.yaml ` . You can either :
137+ Open :
109138
110- - Copy PDFs into ` pathologybook/ ` and run ` sync-library ` .
111- - Upload PDFs from the browser UI.
112- - Create a local config file manually from the example:
139+ ``` text
140+ http://127.0.0.1:8000/ui
141+ ```
142+
143+ Then use the library panel to upload PDFs and trigger a sync.
144+
145+ ### Option C: Use a local config file
113146
114147``` bash
115148cp config/library.example.yaml config/library.local.yaml
116149```
117150
118- Example local config :
151+ Example:
119152
120153``` yaml
121154library_name : my-pathology-library
@@ -127,30 +160,112 @@ documents:
127160 path : GU_Review.pdf
128161` ` `
129162
130- ## Common Commands
163+ ## Enable Model-Backed Answers
164+
165+ OpenAI-backed embeddings and ` /ask` are optional.
166+
167+ If you do nothing, the project still works in retrieval-preview mode.
168+
169+ If you want model-backed answers :
170+
171+ 1. Set `OPENAI_API_KEY`
172+ 2. restart the deployment or rerun the sync
173+
174+ Simplest path :
175+
176+ ` ` ` bash
177+ export OPENAI_API_KEY=your_key_here
178+ bash scripts/deploy_portable.sh
179+ ` ` `
180+
181+ If you already deployed without a key :
182+
183+ ` ` ` bash
184+ export OPENAI_API_KEY=your_key_here
185+ bash scripts/deploy_portable.sh --skip-browser
186+ ` ` `
187+
188+ After that you can ask :
131189
132190` ` ` bash
133- pathology-client health
134- pathology-client library
135- pathology-client search --query "ductal carcinoma in situ"
136191pathology-client ask --question "What are the key features of mucinous carcinoma of the prostate?"
137- pathology-client sync-library
138192` ` `
139193
140- ## Docker Services
194+ # # Common Commands
141195
142- The active runtime stack is intentionally small:
196+ ` ` ` bash
197+ make doctor
198+ make test
199+ make ui
200+ make portable-up
201+ make portable-bundle
202+ ` ` `
143203
144- - ` postgres ` : PostgreSQL with ` pgvector `
145- - ` api ` : FastAPI application
146- - ` ingest ` : one-shot ingestion container when explicitly invoked
204+ Direct script entrypoints :
147205
148- Start the default stack:
206+ ` ` ` bash
207+ bash scripts/doctor.sh
208+ bash scripts/bootstrap.sh
209+ bash scripts/deploy_portable.sh --allow-no-openai
210+ bash scripts/package_portable.sh
211+ ` ` `
212+
213+ # # Troubleshooting
214+
215+ # ## Check your environment
216+
217+ ` ` ` bash
218+ bash scripts/doctor.sh
219+ ` ` `
220+
221+ # ## The browser UI does not open
222+
223+ Open it manually :
224+
225+ ` ` ` text
226+ http://127.0.0.1:8000/ui
227+ ` ` `
228+
229+ # ## `pathology-client` is not found
230+
231+ Reinstall the local wrappers :
149232
150233` ` ` bash
151- docker compose up --build postgres api
234+ bash scripts/install_openclaw_integration.sh
152235` ` `
153236
237+ # ## No answers are generated
238+
239+ That is expected when :
240+
241+ - your library is empty
242+ - your retrieval hits are too weak
243+ - ` OPENAI_API_KEY` is not configured
244+
245+ In that case :
246+
247+ - upload or copy more PDFs
248+ - run `pathology-client sync-library`
249+ - verify `OPENAI_API_KEY` if you want synthesized answers
250+
251+ # # What This Repository Contains
252+
253+ - local PDF ingestion and chunking
254+ - ` pgvector` -backed retrieval
255+ - a browser UI
256+ - a local CLI client
257+ - upload and sync flows for pathology PDFs
258+ - evidence guardrails for ungrounded medical answers
259+ - optional OpenClaw integration
260+
261+ It does not contain :
262+
263+ - pathology textbooks
264+ - patient data
265+ - PHI
266+ - a clinical-grade decision engine
267+ - commercial-use rights
268+
154269# # Project Layout
155270
156271` ` ` text
@@ -172,13 +287,14 @@ tests/ Unit and API tests
172287# # Safety and Intended Use
173288
174289- Review all outputs before using them in education, reporting, or research.
175- - Treat citations as aids, not as a final authority.
176- - Keep patient data and licensed materials out of version control and public issue trackers.
290+ - Treat citations as aids, not as final authority.
291+ - Do not use this as a substitute for expert pathology review.
292+ - Do not commit copyrighted textbooks, extracted page text, page previews, embeddings, PHI, or secrets.
177293
178294# # Contributing
179295
180- See [ CONTRIBUTING.md] ( CONTRIBUTING.md ) for contribution guidelines and repo hygiene rules .
296+ See [CONTRIBUTING.md](CONTRIBUTING.md).
181297
182298# # Security
183299
184- See [ SECURITY.md] ( SECURITY.md ) for reporting guidance and data-handling expectations .
300+ See [SECURITY.md](SECURITY.md).
0 commit comments