chaitanyakasaraneni
diff --git a/‎.readthedocs.yaml‎
Lines changed: 16 additions & 0 deletions b/‎.readthedocs.yaml‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎docs/api/ingest.md‎
Lines changed: 17 additions & 0 deletions b/‎docs/api/ingest.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/api/preprocess.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/api/preprocess.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/api/split.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/api/split.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/api/temporal.md‎
Lines changed: 13 additions & 0 deletions b/‎docs/api/temporal.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎docs/contributing.md‎
Lines changed: 47 additions & 0 deletions b/‎docs/contributing.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎docs/getting-started.md‎
Lines changed: 142 additions & 0 deletions b/‎docs/getting-started.md‎
Lines changed: 142 additions & 0 deletions
diff --git a/‎docs/guide/ingest.md‎
Lines changed: 137 additions & 0 deletions b/‎docs/guide/ingest.md‎
Lines changed: 137 additions & 0 deletions
@@ -0,0 +1,16 @@
+version: 2
+
+build:
+  os: ubuntu-24.04
+  tools:
+    python: "3.12"
+
+mkdocs:
+  configuration: mkdocs.yml
+
+python:
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - dev
@@ -0,0 +1,17 @@
+# clinops.ingest
+
+::: clinops.ingest.mimic_tables.MimicTableLoader
+
+::: clinops.ingest.mimic.MimicLoader
+
+::: clinops.ingest.mimic_iii.MimicIIILoader
+
+::: clinops.ingest.fhir.FHIRLoader
+
+::: clinops.ingest.flat.FlatFileLoader
+
+::: clinops.ingest.schema.ClinicalSchema
+
+::: clinops.ingest.schema.ColumnSpec
+
+::: clinops.ingest.schema.SchemaValidationError
@@ -0,0 +1,7 @@
+# clinops.preprocess
+
+::: clinops.preprocess.outliers.ClinicalOutlierClipper
+
+::: clinops.preprocess.units.UnitNormalizer
+
+::: clinops.preprocess.icd.ICDMapper
@@ -0,0 +1,9 @@
+# clinops.split
+
+::: clinops.split.splitters.TemporalSplitter
+
+::: clinops.split.splitters.PatientSplitter
+
+::: clinops.split.splitters.StratifiedPatientSplitter
+
+::: clinops.split.splitters.SplitResult
@@ -0,0 +1,13 @@
+# clinops.temporal
+
+::: clinops.temporal.windower.TemporalWindower
+
+::: clinops.temporal.windower.WindowConfig
+
+::: clinops.temporal.imputation.Imputer
+
+::: clinops.temporal.imputation.ImputationStrategy
+
+::: clinops.temporal.features.LagFeatureBuilder
+
+::: clinops.temporal.cohort.CohortAligner
@@ -0,0 +1,47 @@
+# Contributing
+
+Contributions are welcome — bug reports, documentation improvements, and new features all help.
+
+## Setup
+
+```bash
+git clone https://github.com/chaitanyakasaraneni/clinops
+cd clinops
+pip install -e ".[dev]"
+```
+
+## Running tests
+
+```bash
+pytest tests/ -v
+```
+
+## Linting and formatting
+
+```bash
+ruff check clinops/
+ruff format clinops/
+mypy clinops/ --ignore-missing-imports
+```
+
+All three must pass before opening a pull request.
+
+## Building docs locally
+
+```bash
+mkdocs serve
+```
+
+Then open [http://127.0.0.1:8000](http://127.0.0.1:8000).
+
+## Pull request checklist
+
+- [ ] Tests pass (`pytest tests/`)
+- [ ] Ruff lint and format pass
+- [ ] mypy passes
+- [ ] Docstrings updated for any changed public API
+- [ ] Entry added to the relevant guide page if adding a new feature
+
+## Code of Conduct
+
+See [CODE_OF_CONDUCT.md](https://github.com/chaitanyakasaraneni/clinops/blob/main/CODE_OF_CONDUCT.md).
@@ -0,0 +1,142 @@
+# Getting Started
+
+## Installation
+
+Requires Python **3.12+**.
+
+```bash
+pip install clinops
+```
+
+Install optional extras for FHIR or cloud support:
+
+```bash
+pip install clinops[fhir]   # FHIR R4 loader
+pip install clinops[gcp]    # Google Cloud Storage / BigQuery
+pip install clinops[aws]    # AWS S3 / boto3
+```
+
+For development (includes tests, linting, and docs):
+
+```bash
+git clone https://github.com/chaitanyakasaraneni/clinops
+cd clinops
+pip install -e ".[dev]"
+```
+
+## Verify the install
+
+```python
+import clinops
+print(clinops.__version__)
+```
+
+---
+
+## End-to-end example
+
+This walkthrough mirrors a typical research workflow: load MIMIC-IV data, preprocess it, build temporal windows, and produce a patient-level train/test split without data leakage.
+
+### 1. Load data
+
+```python
+from clinops.ingest import MimicTableLoader
+
+tbl = MimicTableLoader("/data/mimic-iv-2.2")
+
+# ICU chartevents — charttime parsed as datetime automatically
+charts = tbl.chartevents(subject_ids=list(range(10000032, 10000132)))
+
+# ICU stays — needed to align windows to admission time
+stays = tbl.icustays(subject_ids=list(range(10000032, 10000132)))
+
+# Admissions — contains hospital_expire_flag outcome
+adm = tbl.admissions(subject_ids=list(range(10000032, 10000132)))
+```
+
+### 2. Preprocess
+
+```python
+from clinops.preprocess import ClinicalOutlierClipper, UnitNormalizer
+
+# Clip values outside physiological bounds (heart_rate, spo2, sbp, ...)
+charts = ClinicalOutlierClipper(action="clip").fit_transform(charts)
+
+# Normalize any mixed-unit columns (e.g. glucose in mmol/L → mg/dL)
+if "glucose_unit" in charts.columns:
+    charts = UnitNormalizer(
+        column_unit_map={"glucose": "glucose_unit"}
+    ).transform(charts)
+```
+
+### 3. Align to ICU admission
+
+```python
+from clinops.temporal import CohortAligner
+
+# Keep only measurements within 48 hours of ICU admission
+aligned = CohortAligner(
+    anchor_col="intime",
+    max_hours_before=0,
+    max_hours_after=48,
+).align(events_df=charts, anchor_df=stays)
+```
+
+### 4. Build temporal windows
+
+```python
+from clinops.temporal import TemporalWindower, Imputer, ImputationStrategy
+
+# 24-hour sliding windows, stepped every 6 hours
+windower = TemporalWindower(window_hours=24, step_hours=6)
+windows = windower.fit_transform(
+    df=aligned,
+    id_col="subject_id",
+    time_col="charttime",
+    feature_cols=["heart_rate", "spo2", "resp_rate", "map"],
+)
+
+# Gap-aware forward fill — does not propagate across patients or
+# across gaps longer than 6 hours
+imputer = Imputer(
+    ImputationStrategy.FORWARD_FILL,
+    max_gap_hours=6,
+    time_col="charttime",
+    id_col="subject_id",
+)
+windows = imputer.fit_transform(windows)
+```
+
+### 5. Add outcome and split
+
+```python
+from clinops.split import StratifiedPatientSplitter
+
+# Attach outcome from admissions table
+windows = windows.merge(
+    adm[["subject_id", "hospital_expire_flag"]],
+    on="subject_id",
+    how="left",
+)
+
+# Stratified patient split — preserves outcome rate, no cross-patient leakage
+result = StratifiedPatientSplitter(
+    id_col="subject_id",
+    outcome_col="hospital_expire_flag",
+    test_size=0.2,
+).split(windows)
+
+print(result.summary())
+train_df = result.train
+test_df  = result.test
+```
+
+---
+
+## Next steps
+
+- [Ingest guide](guide/ingest.md) — all loader options including FHIR and flat files
+- [Preprocess guide](guide/preprocess.md) — outlier bounds, unit conversions, ICD mapping
+- [Temporal guide](guide/temporal.md) — windowing strategies, imputation, lag features
+- [Split guide](guide/split.md) — temporal, patient, and stratified splits
+- [API Reference](api/ingest.md) — full class and method signatures
@@ -0,0 +1,137 @@
+# Ingest
+
+`clinops.ingest` provides loaders for MIMIC-IV, MIMIC-III, FHIR R4, and flat CSV/Parquet files with schema validation built in.
+
+---
+
+## MimicTableLoader
+
+The fastest way to work with MIMIC-IV. Pre-built schemas for the five most-used tables — no `ColumnSpec` definitions required.
+
+```python
+from clinops.ingest import MimicTableLoader
+
+tbl = MimicTableLoader("/data/mimic-iv-2.2")
+```
+
+### Available tables
+
+```python
+# ICU chartevents — charttime parsed as datetime automatically
+charts = tbl.chartevents(subject_ids=[10000032, 10000980])
+
+# Lab results
+labs = tbl.labevents(subject_ids=[10000032], with_ref_range=True)
+
+# Hospital admissions — includes hospital_expire_flag mortality outcome
+adm = tbl.admissions(subject_ids=[10000032])
+
+# ICD-9/10 diagnoses — primary_only keeps only seq_num == 1
+dx = tbl.diagnoses_icd(subject_ids=[10000032], primary_only=True)
+
+# ICU stays — with_los_band adds a <1d / 1-3d / 3-7d / >7d length-of-stay column
+stays = tbl.icustays(subject_ids=[10000032], with_los_band=True)
+```
+
+### Audit a new MIMIC download
+
+Check row counts, column counts, and null rates without loading full tables:
+
+```python
+tbl.summary()
+#        table  rows_sampled  columns  null_rate_pct
+#  chartevents         10000       23           8.41
+#    labevents         10000       12           4.17
+#   admissions         10000       15           6.02
+# diagnoses_icd        10000        5           0.00
+#     icustays         10000        8           2.31
+```
+
+---
+
+## MimicLoader
+
+For custom filtering and chunk-based loading of large tables.
+
+```python
+from clinops.ingest import MimicLoader
+
+loader = MimicLoader("/data/mimic-iv-2.2")
+
+charts = loader.chartevents(
+    subject_ids=[10000032, 10000980],
+    start_time="2150-01-01",
+    end_time="2150-01-10",
+)
+labs   = loader.labevents(subject_ids=[10000032, 10000980])
+stays  = loader.icustays(subject_ids=[10000032, 10000980])
+```
+
+Large tables (`chartevents`, `labevents`) are loaded in chunks when `chunk_size` is set to avoid memory issues:
+
+```python
+loader = MimicLoader("/data/mimic-iv-2.2", chunk_size=100_000)
+charts = loader.chartevents()   # streams in 100k-row chunks internally
+```
+
+---
+
+## MimicIIILoader
+
+Equivalent loader for MIMIC-III (ICD-9 codes, slightly different schema).
+
+```python
+from clinops.ingest import MimicIIILoader
+
+loader = MimicIIILoader("/data/mimic-iii-1.4")
+charts = loader.chartevents(subject_ids=[10006])
+```
+
+---
+
+## FHIRLoader
+
+Load FHIR R4 resources from a JSON Bundle or NDJSON export.
+
+```python
+from clinops.ingest import FHIRLoader
+
+loader   = FHIRLoader("/data/fhir_export")
+obs      = loader.observations(category="vital-signs")
+patients = loader.patients()
+```
+
+!!! note
+    Requires the `fhir` extra: `pip install clinops[fhir]`
+
+---
+
+## FlatFileLoader
+
+Load and validate any flat CSV or Parquet file with a custom schema.
+
+```python
+from clinops.ingest import FlatFileLoader, ClinicalSchema, ColumnSpec
+
+schema = ClinicalSchema(
+    name="vitals",
+    columns=[
+        ColumnSpec("subject_id", nullable=False),
+        ColumnSpec("heart_rate", min_value=0,  max_value=300),
+        ColumnSpec("spo2",       min_value=50, max_value=100),
+    ]
+)
+df = FlatFileLoader("vitals.csv", schema=schema).load()
+```
+
+`SchemaValidationError` is raised if any `nullable=False` column contains nulls, or if values fall outside the declared bounds.
+
+---
+
+## Supported file formats
+
+| Format | Extension |
+|---|---|
+| CSV | `.csv` |
+| Compressed CSV | `.csv.gz` |
+| Parquet | `.parquet` |