|
| 1 | +# Getting Started |
| 2 | + |
| 3 | +## Installation |
| 4 | + |
| 5 | +Requires Python **3.12+**. |
| 6 | + |
| 7 | +```bash |
| 8 | +pip install clinops |
| 9 | +``` |
| 10 | + |
| 11 | +Install optional extras for FHIR or cloud support: |
| 12 | + |
| 13 | +```bash |
| 14 | +pip install clinops[fhir] # FHIR R4 loader |
| 15 | +pip install clinops[gcp] # Google Cloud Storage / BigQuery |
| 16 | +pip install clinops[aws] # AWS S3 / boto3 |
| 17 | +``` |
| 18 | + |
| 19 | +For development (includes tests, linting, and docs): |
| 20 | + |
| 21 | +```bash |
| 22 | +git clone https://github.com/chaitanyakasaraneni/clinops |
| 23 | +cd clinops |
| 24 | +pip install -e ".[dev]" |
| 25 | +``` |
| 26 | + |
| 27 | +## Verify the install |
| 28 | + |
| 29 | +```python |
| 30 | +import clinops |
| 31 | +print(clinops.__version__) |
| 32 | +``` |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## End-to-end example |
| 37 | + |
| 38 | +This walkthrough mirrors a typical research workflow: load MIMIC-IV data, preprocess it, build temporal windows, and produce a patient-level train/test split without data leakage. |
| 39 | + |
| 40 | +### 1. Load data |
| 41 | + |
| 42 | +```python |
| 43 | +from clinops.ingest import MimicTableLoader |
| 44 | + |
| 45 | +tbl = MimicTableLoader("/data/mimic-iv-2.2") |
| 46 | + |
| 47 | +# ICU chartevents — charttime parsed as datetime automatically |
| 48 | +charts = tbl.chartevents(subject_ids=list(range(10000032, 10000132))) |
| 49 | + |
| 50 | +# ICU stays — needed to align windows to admission time |
| 51 | +stays = tbl.icustays(subject_ids=list(range(10000032, 10000132))) |
| 52 | + |
| 53 | +# Admissions — contains hospital_expire_flag outcome |
| 54 | +adm = tbl.admissions(subject_ids=list(range(10000032, 10000132))) |
| 55 | +``` |
| 56 | + |
| 57 | +### 2. Preprocess |
| 58 | + |
| 59 | +```python |
| 60 | +from clinops.preprocess import ClinicalOutlierClipper, UnitNormalizer |
| 61 | + |
| 62 | +# Clip values outside physiological bounds (heart_rate, spo2, sbp, ...) |
| 63 | +charts = ClinicalOutlierClipper(action="clip").fit_transform(charts) |
| 64 | + |
| 65 | +# Normalize any mixed-unit columns (e.g. glucose in mmol/L → mg/dL) |
| 66 | +if "glucose_unit" in charts.columns: |
| 67 | + charts = UnitNormalizer( |
| 68 | + column_unit_map={"glucose": "glucose_unit"} |
| 69 | + ).transform(charts) |
| 70 | +``` |
| 71 | + |
| 72 | +### 3. Align to ICU admission |
| 73 | + |
| 74 | +```python |
| 75 | +from clinops.temporal import CohortAligner |
| 76 | + |
| 77 | +# Keep only measurements within 48 hours of ICU admission |
| 78 | +aligned = CohortAligner( |
| 79 | + anchor_col="intime", |
| 80 | + max_hours_before=0, |
| 81 | + max_hours_after=48, |
| 82 | +).align(events_df=charts, anchor_df=stays) |
| 83 | +``` |
| 84 | + |
| 85 | +### 4. Build temporal windows |
| 86 | + |
| 87 | +```python |
| 88 | +from clinops.temporal import TemporalWindower, Imputer, ImputationStrategy |
| 89 | + |
| 90 | +# 24-hour sliding windows, stepped every 6 hours |
| 91 | +windower = TemporalWindower(window_hours=24, step_hours=6) |
| 92 | +windows = windower.fit_transform( |
| 93 | + df=aligned, |
| 94 | + id_col="subject_id", |
| 95 | + time_col="charttime", |
| 96 | + feature_cols=["heart_rate", "spo2", "resp_rate", "map"], |
| 97 | +) |
| 98 | + |
| 99 | +# Gap-aware forward fill — does not propagate across patients or |
| 100 | +# across gaps longer than 6 hours |
| 101 | +imputer = Imputer( |
| 102 | + ImputationStrategy.FORWARD_FILL, |
| 103 | + max_gap_hours=6, |
| 104 | + time_col="charttime", |
| 105 | + id_col="subject_id", |
| 106 | +) |
| 107 | +windows = imputer.fit_transform(windows) |
| 108 | +``` |
| 109 | + |
| 110 | +### 5. Add outcome and split |
| 111 | + |
| 112 | +```python |
| 113 | +from clinops.split import StratifiedPatientSplitter |
| 114 | + |
| 115 | +# Attach outcome from admissions table |
| 116 | +windows = windows.merge( |
| 117 | + adm[["subject_id", "hospital_expire_flag"]], |
| 118 | + on="subject_id", |
| 119 | + how="left", |
| 120 | +) |
| 121 | + |
| 122 | +# Stratified patient split — preserves outcome rate, no cross-patient leakage |
| 123 | +result = StratifiedPatientSplitter( |
| 124 | + id_col="subject_id", |
| 125 | + outcome_col="hospital_expire_flag", |
| 126 | + test_size=0.2, |
| 127 | +).split(windows) |
| 128 | + |
| 129 | +print(result.summary()) |
| 130 | +train_df = result.train |
| 131 | +test_df = result.test |
| 132 | +``` |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Next steps |
| 137 | + |
| 138 | +- [Ingest guide](guide/ingest.md) — all loader options including FHIR and flat files |
| 139 | +- [Preprocess guide](guide/preprocess.md) — outlier bounds, unit conversions, ICD mapping |
| 140 | +- [Temporal guide](guide/temporal.md) — windowing strategies, imputation, lag features |
| 141 | +- [Split guide](guide/split.md) — temporal, patient, and stratified splits |
| 142 | +- [API Reference](api/ingest.md) — full class and method signatures |
0 commit comments