Skip to content

Commit 8369195

Browse files
docs: add MkDocs site for ReadTheDocs
- .readthedocs.yaml — RTD build config (MkDocs + Python 3.12, installs dev extras) - mkdocs.yml — Material theme, mkdocstrings for auto API docs, nav structure - docs/index.md — landing page - docs/getting-started.md — install + end-to-end walkthrough - docs/guide/{ingest,preprocess,temporal,split}.md — user guides per module - docs/api/{ingest,preprocess,temporal,split}.md — API reference via mkdocstrings - docs/contributing.md — setup, test, lint, and PR checklist Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent ff3764d commit 8369195

13 files changed

Lines changed: 885 additions & 0 deletions

.readthedocs.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
version: 2
2+
3+
build:
4+
os: ubuntu-24.04
5+
tools:
6+
python: "3.12"
7+
8+
mkdocs:
9+
configuration: mkdocs.yml
10+
11+
python:
12+
install:
13+
- method: pip
14+
path: .
15+
extra_requirements:
16+
- dev

docs/api/ingest.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# clinops.ingest
2+
3+
::: clinops.ingest.mimic_tables.MimicTableLoader
4+
5+
::: clinops.ingest.mimic.MimicLoader
6+
7+
::: clinops.ingest.mimic_iii.MimicIIILoader
8+
9+
::: clinops.ingest.fhir.FHIRLoader
10+
11+
::: clinops.ingest.flat.FlatFileLoader
12+
13+
::: clinops.ingest.schema.ClinicalSchema
14+
15+
::: clinops.ingest.schema.ColumnSpec
16+
17+
::: clinops.ingest.schema.SchemaValidationError

docs/api/preprocess.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# clinops.preprocess
2+
3+
::: clinops.preprocess.outliers.ClinicalOutlierClipper
4+
5+
::: clinops.preprocess.units.UnitNormalizer
6+
7+
::: clinops.preprocess.icd.ICDMapper

docs/api/split.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# clinops.split
2+
3+
::: clinops.split.splitters.TemporalSplitter
4+
5+
::: clinops.split.splitters.PatientSplitter
6+
7+
::: clinops.split.splitters.StratifiedPatientSplitter
8+
9+
::: clinops.split.splitters.SplitResult

docs/api/temporal.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# clinops.temporal
2+
3+
::: clinops.temporal.windower.TemporalWindower
4+
5+
::: clinops.temporal.windower.WindowConfig
6+
7+
::: clinops.temporal.imputation.Imputer
8+
9+
::: clinops.temporal.imputation.ImputationStrategy
10+
11+
::: clinops.temporal.features.LagFeatureBuilder
12+
13+
::: clinops.temporal.cohort.CohortAligner

docs/contributing.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Contributing
2+
3+
Contributions are welcome — bug reports, documentation improvements, and new features all help.
4+
5+
## Setup
6+
7+
```bash
8+
git clone https://github.com/chaitanyakasaraneni/clinops
9+
cd clinops
10+
pip install -e ".[dev]"
11+
```
12+
13+
## Running tests
14+
15+
```bash
16+
pytest tests/ -v
17+
```
18+
19+
## Linting and formatting
20+
21+
```bash
22+
ruff check clinops/
23+
ruff format clinops/
24+
mypy clinops/ --ignore-missing-imports
25+
```
26+
27+
All three must pass before opening a pull request.
28+
29+
## Building docs locally
30+
31+
```bash
32+
mkdocs serve
33+
```
34+
35+
Then open [http://127.0.0.1:8000](http://127.0.0.1:8000).
36+
37+
## Pull request checklist
38+
39+
- [ ] Tests pass (`pytest tests/`)
40+
- [ ] Ruff lint and format pass
41+
- [ ] mypy passes
42+
- [ ] Docstrings updated for any changed public API
43+
- [ ] Entry added to the relevant guide page if adding a new feature
44+
45+
## Code of Conduct
46+
47+
See [CODE_OF_CONDUCT.md](https://github.com/chaitanyakasaraneni/clinops/blob/main/CODE_OF_CONDUCT.md).

docs/getting-started.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Getting Started
2+
3+
## Installation
4+
5+
Requires Python **3.12+**.
6+
7+
```bash
8+
pip install clinops
9+
```
10+
11+
Install optional extras for FHIR or cloud support:
12+
13+
```bash
14+
pip install clinops[fhir] # FHIR R4 loader
15+
pip install clinops[gcp] # Google Cloud Storage / BigQuery
16+
pip install clinops[aws] # AWS S3 / boto3
17+
```
18+
19+
For development (includes tests, linting, and docs):
20+
21+
```bash
22+
git clone https://github.com/chaitanyakasaraneni/clinops
23+
cd clinops
24+
pip install -e ".[dev]"
25+
```
26+
27+
## Verify the install
28+
29+
```python
30+
import clinops
31+
print(clinops.__version__)
32+
```
33+
34+
---
35+
36+
## End-to-end example
37+
38+
This walkthrough mirrors a typical research workflow: load MIMIC-IV data, preprocess it, build temporal windows, and produce a patient-level train/test split without data leakage.
39+
40+
### 1. Load data
41+
42+
```python
43+
from clinops.ingest import MimicTableLoader
44+
45+
tbl = MimicTableLoader("/data/mimic-iv-2.2")
46+
47+
# ICU chartevents — charttime parsed as datetime automatically
48+
charts = tbl.chartevents(subject_ids=list(range(10000032, 10000132)))
49+
50+
# ICU stays — needed to align windows to admission time
51+
stays = tbl.icustays(subject_ids=list(range(10000032, 10000132)))
52+
53+
# Admissions — contains hospital_expire_flag outcome
54+
adm = tbl.admissions(subject_ids=list(range(10000032, 10000132)))
55+
```
56+
57+
### 2. Preprocess
58+
59+
```python
60+
from clinops.preprocess import ClinicalOutlierClipper, UnitNormalizer
61+
62+
# Clip values outside physiological bounds (heart_rate, spo2, sbp, ...)
63+
charts = ClinicalOutlierClipper(action="clip").fit_transform(charts)
64+
65+
# Normalize any mixed-unit columns (e.g. glucose in mmol/L → mg/dL)
66+
if "glucose_unit" in charts.columns:
67+
charts = UnitNormalizer(
68+
column_unit_map={"glucose": "glucose_unit"}
69+
).transform(charts)
70+
```
71+
72+
### 3. Align to ICU admission
73+
74+
```python
75+
from clinops.temporal import CohortAligner
76+
77+
# Keep only measurements within 48 hours of ICU admission
78+
aligned = CohortAligner(
79+
anchor_col="intime",
80+
max_hours_before=0,
81+
max_hours_after=48,
82+
).align(events_df=charts, anchor_df=stays)
83+
```
84+
85+
### 4. Build temporal windows
86+
87+
```python
88+
from clinops.temporal import TemporalWindower, Imputer, ImputationStrategy
89+
90+
# 24-hour sliding windows, stepped every 6 hours
91+
windower = TemporalWindower(window_hours=24, step_hours=6)
92+
windows = windower.fit_transform(
93+
df=aligned,
94+
id_col="subject_id",
95+
time_col="charttime",
96+
feature_cols=["heart_rate", "spo2", "resp_rate", "map"],
97+
)
98+
99+
# Gap-aware forward fill — does not propagate across patients or
100+
# across gaps longer than 6 hours
101+
imputer = Imputer(
102+
ImputationStrategy.FORWARD_FILL,
103+
max_gap_hours=6,
104+
time_col="charttime",
105+
id_col="subject_id",
106+
)
107+
windows = imputer.fit_transform(windows)
108+
```
109+
110+
### 5. Add outcome and split
111+
112+
```python
113+
from clinops.split import StratifiedPatientSplitter
114+
115+
# Attach outcome from admissions table
116+
windows = windows.merge(
117+
adm[["subject_id", "hospital_expire_flag"]],
118+
on="subject_id",
119+
how="left",
120+
)
121+
122+
# Stratified patient split — preserves outcome rate, no cross-patient leakage
123+
result = StratifiedPatientSplitter(
124+
id_col="subject_id",
125+
outcome_col="hospital_expire_flag",
126+
test_size=0.2,
127+
).split(windows)
128+
129+
print(result.summary())
130+
train_df = result.train
131+
test_df = result.test
132+
```
133+
134+
---
135+
136+
## Next steps
137+
138+
- [Ingest guide](guide/ingest.md) — all loader options including FHIR and flat files
139+
- [Preprocess guide](guide/preprocess.md) — outlier bounds, unit conversions, ICD mapping
140+
- [Temporal guide](guide/temporal.md) — windowing strategies, imputation, lag features
141+
- [Split guide](guide/split.md) — temporal, patient, and stratified splits
142+
- [API Reference](api/ingest.md) — full class and method signatures

docs/guide/ingest.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Ingest
2+
3+
`clinops.ingest` provides loaders for MIMIC-IV, MIMIC-III, FHIR R4, and flat CSV/Parquet files with schema validation built in.
4+
5+
---
6+
7+
## MimicTableLoader
8+
9+
The fastest way to work with MIMIC-IV. Pre-built schemas for the five most-used tables — no `ColumnSpec` definitions required.
10+
11+
```python
12+
from clinops.ingest import MimicTableLoader
13+
14+
tbl = MimicTableLoader("/data/mimic-iv-2.2")
15+
```
16+
17+
### Available tables
18+
19+
```python
20+
# ICU chartevents — charttime parsed as datetime automatically
21+
charts = tbl.chartevents(subject_ids=[10000032, 10000980])
22+
23+
# Lab results
24+
labs = tbl.labevents(subject_ids=[10000032], with_ref_range=True)
25+
26+
# Hospital admissions — includes hospital_expire_flag mortality outcome
27+
adm = tbl.admissions(subject_ids=[10000032])
28+
29+
# ICD-9/10 diagnoses — primary_only keeps only seq_num == 1
30+
dx = tbl.diagnoses_icd(subject_ids=[10000032], primary_only=True)
31+
32+
# ICU stays — with_los_band adds a <1d / 1-3d / 3-7d / >7d length-of-stay column
33+
stays = tbl.icustays(subject_ids=[10000032], with_los_band=True)
34+
```
35+
36+
### Audit a new MIMIC download
37+
38+
Check row counts, column counts, and null rates without loading full tables:
39+
40+
```python
41+
tbl.summary()
42+
# table rows_sampled columns null_rate_pct
43+
# chartevents 10000 23 8.41
44+
# labevents 10000 12 4.17
45+
# admissions 10000 15 6.02
46+
# diagnoses_icd 10000 5 0.00
47+
# icustays 10000 8 2.31
48+
```
49+
50+
---
51+
52+
## MimicLoader
53+
54+
For custom filtering and chunk-based loading of large tables.
55+
56+
```python
57+
from clinops.ingest import MimicLoader
58+
59+
loader = MimicLoader("/data/mimic-iv-2.2")
60+
61+
charts = loader.chartevents(
62+
subject_ids=[10000032, 10000980],
63+
start_time="2150-01-01",
64+
end_time="2150-01-10",
65+
)
66+
labs = loader.labevents(subject_ids=[10000032, 10000980])
67+
stays = loader.icustays(subject_ids=[10000032, 10000980])
68+
```
69+
70+
Large tables (`chartevents`, `labevents`) are loaded in chunks when `chunk_size` is set to avoid memory issues:
71+
72+
```python
73+
loader = MimicLoader("/data/mimic-iv-2.2", chunk_size=100_000)
74+
charts = loader.chartevents() # streams in 100k-row chunks internally
75+
```
76+
77+
---
78+
79+
## MimicIIILoader
80+
81+
Equivalent loader for MIMIC-III (ICD-9 codes, slightly different schema).
82+
83+
```python
84+
from clinops.ingest import MimicIIILoader
85+
86+
loader = MimicIIILoader("/data/mimic-iii-1.4")
87+
charts = loader.chartevents(subject_ids=[10006])
88+
```
89+
90+
---
91+
92+
## FHIRLoader
93+
94+
Load FHIR R4 resources from a JSON Bundle or NDJSON export.
95+
96+
```python
97+
from clinops.ingest import FHIRLoader
98+
99+
loader = FHIRLoader("/data/fhir_export")
100+
obs = loader.observations(category="vital-signs")
101+
patients = loader.patients()
102+
```
103+
104+
!!! note
105+
Requires the `fhir` extra: `pip install clinops[fhir]`
106+
107+
---
108+
109+
## FlatFileLoader
110+
111+
Load and validate any flat CSV or Parquet file with a custom schema.
112+
113+
```python
114+
from clinops.ingest import FlatFileLoader, ClinicalSchema, ColumnSpec
115+
116+
schema = ClinicalSchema(
117+
name="vitals",
118+
columns=[
119+
ColumnSpec("subject_id", nullable=False),
120+
ColumnSpec("heart_rate", min_value=0, max_value=300),
121+
ColumnSpec("spo2", min_value=50, max_value=100),
122+
]
123+
)
124+
df = FlatFileLoader("vitals.csv", schema=schema).load()
125+
```
126+
127+
`SchemaValidationError` is raised if any `nullable=False` column contains nulls, or if values fall outside the declared bounds.
128+
129+
---
130+
131+
## Supported file formats
132+
133+
| Format | Extension |
134+
|---|---|
135+
| CSV | `.csv` |
136+
| Compressed CSV | `.csv.gz` |
137+
| Parquet | `.parquet` |

0 commit comments

Comments
 (0)