Skip to content

Commit d4c3baf

Browse files
committed
Prepare public noncommercial release
0 parents  commit d4c3baf

70 files changed

Lines changed: 6082 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.git
2+
.gitignore
3+
.env
4+
.venv
5+
.venv-pdf
6+
__pycache__
7+
pathologybook
8+
data/processed
9+
data/training
10+
data/evals
11+
tmp

.env.example

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
APP_ENV=development
2+
APP_HOST=0.0.0.0
3+
APP_PORT=8000
4+
OPENAI_API_KEY=
5+
OPENAI_MODEL=gpt-5.4
6+
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
7+
PATHOLOGY_PDF_ROOT=pathologybook
8+
PATHOLOGY_DATA_DIR=data
9+
PATHOLOGY_LIBRARY_CONFIG=config/library.local.yaml
10+
PATHOLOGY_CHUNK_TOKENS=900
11+
PATHOLOGY_CHUNK_OVERLAP=120
12+
PATHOLOGY_TOP_K=6
13+
PATHOLOGY_EMBEDDING_DIMENSIONS=3072
14+
PATHOLOGY_EMBEDDING_BATCH_SIZE=16
15+
POSTGRES_URL=postgresql+psycopg://pathology:pathology@localhost:5432/pathology_ai
16+
OPENCLAW_PATHOLOGY_API=http://127.0.0.1:8000

.gitignore

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
.DS_Store
2+
.env
3+
.venv
4+
.venv-pdf
5+
__pycache__/
6+
.pytest_cache/
7+
.mypy_cache/
8+
.ruff_cache/
9+
*.pyc
10+
*.pyo
11+
*.pyd
12+
*.sqlite3
13+
*.log
14+
*.egg-info/
15+
16+
config/library.local.yaml
17+
data/raw/*
18+
!data/raw/.gitkeep
19+
data/processed/*
20+
!data/processed/.gitkeep
21+
data/training/*
22+
!data/training/.gitkeep
23+
data/evals/*
24+
!data/evals/.gitkeep
25+
tmp/*
26+
!tmp/.gitkeep
27+
28+
pathologybook/*
29+
!pathologybook/.gitkeep
30+
!pathologybook/README.md
31+
pathologybook/.cache/

CONTRIBUTING.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Contributing
2+
3+
Thanks for contributing to Pathology RAG Workbench.
4+
5+
## Before You Open a PR
6+
7+
- Make sure your contribution is compatible with the PolyForm Noncommercial 1.0.0 license.
8+
- Do not add copyrighted textbooks, extracted page text, screenshots, or embeddings to the repository.
9+
- Do not add patient-identifiable information or clinical data.
10+
- Keep changes focused and documented.
11+
12+
## Development Workflow
13+
14+
1. Create a feature branch.
15+
2. Run the test suite before submitting:
16+
17+
```bash
18+
pytest
19+
```
20+
21+
3. If you change ingestion, retrieval, or API behavior, update the relevant docs.
22+
4. Prefer small PRs with a clear user-facing motivation.
23+
24+
## Coding Guidelines
25+
26+
- Target Python 3.12.
27+
- Keep defaults safe for an empty local library.
28+
- Preserve the evidence guardrails for ungrounded medical answers.
29+
- Avoid checking in generated artifacts from `data/`, `tmp/`, or `pathologybook/`.
30+
31+
## Corpus and Licensing
32+
33+
This repository ships without source PDFs. If you test with your own corpus, you are responsible for ensuring you have the right to use it.

LICENSE

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# PolyForm Noncommercial License 1.0.0
2+
3+
<https://polyformproject.org/licenses/noncommercial/1.0.0>
4+
5+
## Acceptance
6+
7+
In order to get any license under these terms, you must agree
8+
to them as both strict obligations and conditions to all
9+
your licenses.
10+
11+
## Copyright License
12+
13+
The licensor grants you a copyright license for the
14+
software to do everything you might do with the software
15+
that would otherwise infringe the licensor's copyright
16+
in it for any permitted purpose. However, you may
17+
only distribute the software according to [Distribution
18+
License](#distribution-license) and make changes or new works
19+
based on the software according to [Changes and New Works
20+
License](#changes-and-new-works-license).
21+
22+
## Distribution License
23+
24+
The licensor grants you an additional copyright license
25+
to distribute copies of the software. Your license
26+
to distribute covers distributing the software with
27+
changes and new works permitted by [Changes and New Works
28+
License](#changes-and-new-works-license).
29+
30+
## Notices
31+
32+
You must ensure that anyone who gets a copy of any part of
33+
the software from you also gets a copy of these terms or the
34+
URL for them above, as well as copies of any plain-text lines
35+
beginning with `Required Notice:` that the licensor provided
36+
with the software. For example:
37+
38+
> Required Notice: Copyright Yoyodyne, Inc. (http://example.com)
39+
40+
## Changes and New Works License
41+
42+
The licensor grants you an additional copyright license to
43+
make changes and new works based on the software for any
44+
permitted purpose.
45+
46+
## Patent License
47+
48+
The licensor grants you a patent license for the software that
49+
covers patent claims the licensor can license, or becomes able
50+
to license, that you would infringe by using the software.
51+
52+
## Noncommercial Purposes
53+
54+
Any noncommercial purpose is a permitted purpose.
55+
56+
## Personal Uses
57+
58+
Personal use for research, experiment, and testing for
59+
the benefit of public knowledge, personal study, private
60+
entertainment, hobby projects, amateur pursuits, or religious
61+
observance, without any anticipated commercial application,
62+
is use for a permitted purpose.
63+
64+
## Noncommercial Organizations
65+
66+
Use by any charitable organization, educational institution,
67+
public research organization, public safety or health
68+
organization, environmental protection organization,
69+
or government institution is use for a permitted purpose
70+
regardless of the source of funding or obligations resulting
71+
from the funding.
72+
73+
## Fair Use
74+
75+
You may have "fair use" rights for the software under the
76+
law. These terms do not limit them.
77+
78+
## No Other Rights
79+
80+
These terms do not allow you to sublicense or transfer any of
81+
your licenses to anyone else, or prevent the licensor from
82+
granting licenses to anyone else. These terms do not imply
83+
any other licenses.
84+
85+
## Patent Defense
86+
87+
If you make any written claim that the software infringes or
88+
contributes to infringement of any patent, your patent license
89+
for the software granted under these terms ends immediately. If
90+
your company makes such a claim, your patent license ends
91+
immediately for work on behalf of your company.
92+
93+
## Violations
94+
95+
The first time you are notified in writing that you have
96+
violated any of these terms, or done anything with the software
97+
not covered by your licenses, your licenses can nonetheless
98+
continue if you come into full compliance with these terms,
99+
and take practical steps to correct past violations, within
100+
32 days of receiving notice. Otherwise, all your licenses
101+
end immediately.
102+
103+
## No Liability
104+
105+
***As far as the law allows, the software comes as is, without
106+
any warranty or condition, and the licensor will not be liable
107+
to you for any damages arising out of these terms or the use
108+
or nature of the software, under any kind of legal claim.***
109+
110+
## Definitions
111+
112+
The **licensor** is the individual or entity offering these
113+
terms, and the **software** is the software the licensor makes
114+
available under these terms.
115+
116+
**You** refers to the individual or entity agreeing to these
117+
terms.
118+
119+
**Your company** is any legal entity, sole proprietorship,
120+
or other kind of organization that you work for, plus all
121+
organizations that have control over, are under the control of,
122+
or are under common control with that organization. **Control**
123+
means ownership of substantially all the assets of an entity,
124+
or the power to direct its management and policies by vote,
125+
contract, or otherwise. Control can be direct or indirect.
126+
127+
**Your licenses** are all the licenses granted to you for the
128+
software under these terms.
129+
130+
**Use** means anything you do with the software requiring one
131+
of your licenses.

Makefile

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
PYTHON := .venv/bin/python
2+
PIP := .venv/bin/pip
3+
4+
.PHONY: bootstrap doctor ingest api test docker-up docker-down portable-up portable-bundle ui
5+
6+
bootstrap:
7+
bash scripts/bootstrap.sh
8+
9+
doctor:
10+
bash scripts/doctor.sh
11+
12+
ingest:
13+
$(PYTHON) -m apps.ingest_worker.cli build-library
14+
15+
api:
16+
$(PYTHON) -m apps.pathology_api.main
17+
18+
ui:
19+
$(PYTHON) -m apps.pathology_client.cli ui
20+
21+
test:
22+
$(PYTHON) -m pytest
23+
24+
docker-up:
25+
docker compose up --build
26+
27+
docker-down:
28+
docker compose down
29+
30+
portable-up:
31+
bash scripts/deploy_portable.sh
32+
33+
portable-bundle:
34+
bash scripts/package_portable.sh

0 commit comments

Comments
 (0)