You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
# CardioSense AI: Clinical Decision Support System
2
2
3
+
[](https://github.com/khanz9664/CardioSense-AI/actions/workflows/pipeline.yml)
4
+
3
5
<palign="center">
4
6
<imgsrc="app/assets/logo.png"width="200"alt="CardioSense AI Banner">
-**The Preprocessing Pipeline**: Utilizes a Scikit-Learn `ColumnTransformer` with `StandardScaler` for vitals and `OneHotEncoder` for categorical clinical markers, ensuring training-inference consistency.
Copy file name to clipboardExpand all lines: docs/ARCHITECTURE.md
+15-9Lines changed: 15 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,19 +49,25 @@ graph TB
49
49
50
50
## 2. The Training & Optimization Pipeline
51
51
52
-
We employ **XGBoost** as the primary engine, optimized via **Optuna** to ensure medical-grade accuracy.
52
+
We employ **XGBoost** as the primary engine, optimized via **Optuna** and supported by a **Production Preprocessing Pipeline** to ensure medical-grade accuracy and inference stability.
53
+
54
+
1.**Robust Feature Engineering**:
55
+
***Numerical Normalization**: `StandardScaler` is applied to all continuous vitals (`age`, `trestbps`, `chol`, `thalach`, `oldpeak`) to prevent feature-dominance and ensure gradient stability.
56
+
***Categorical Encoding**: `OneHotEncoder(drop='if_binary')` converts clinical categorical markers (`sex`, `cp`, `fbs`, `restecg`, `exang`, `slope`, `ca`, `thal`) into a sparse, machine-readable format.
57
+
2.**Pipeline Orchestration**: The entire transformation is wrapped in a Scikit-Learn `Pipeline`. This ensures that the exact same mathematical shifts are applied during real-time inference as were used during training, eliminating training-serving skew.
Copy file name to clipboardExpand all lines: docs/PAPER.md
+12-5Lines changed: 12 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,14 @@ CardioSense AI addresses these gaps by implementing a **Post-Hoc Attribution Lay
28
28
29
29
## 2. Methodology & Mathematical Foundations
30
30
31
-
### 2.1 The Core Intelligence Engine (XGBoost)
31
+
### 2.1 Robust Preprocessing Pipeline
32
+
To ensuring model stability and training-inference consistency, we implement a **Scikit-Learn Pipeline** architecture:
33
+
1.**Feature Normalization**: Numerical vitals ($x_{num} \in \{\text{age, trestbps, chol, thalach, oldpeak}\}$) are transformed using **Z-score normalization** (StandardScaler):
34
+
$$z = \frac{x - \mu}{\sigma}$$
35
+
2.**Categorical Encoding**: Nominal features ($x_{cat} \in \{\text{sex, cp, fbs, restecg, exang, slope, ca, thal}\}$) are transformed via **One-Hot Encoding (OHE)** to a sparse binary vector space.
36
+
3.**Pipeline Consistency**: The transformation parameters ($\mu, \sigma$) are fitted exclusively on the training set and persisted in the `preprocessor.joblib` artifact to eliminate data leakage.
37
+
38
+
### 2.2 The Core Intelligence Engine (XGBoost)
32
39
We utilize **eXtreme Gradient Boosting (XGBoost)**, which optimizes the following regularized objective function:
0 commit comments