1+ ---
2+ title : Home
3+ layout : default
4+ nav_order : 1
5+ ---
6+
17# Welcome to StatClean
28
39Data preprocessing & outlier detection with formal statistical methods and publication-quality reporting.
410
511[ ![ PyPI] ( https://img.shields.io/pypi/v/statclean.svg )] ( https://pypi.org/project/statclean/ )
6- [ ![ Build] ( https://github.com/SubaashNair/StatClean/actions/workflows/pages .yml/badge.svg )] ( https://github.com/SubaashNair/StatClean/actions )
12+ [ ![ Build] ( https://github.com/SubaashNair/StatClean/actions/workflows/publish .yml/badge.svg )] ( https://github.com/SubaashNair/StatClean/actions )
713[ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( ../LICENSE )
814
915> Note: Remover methods return ` self ` . Access cleaned data via ` cleaner.clean_df ` and details via ` cleaner.outlier_info ` .
@@ -37,7 +43,7 @@ cleaned_df = cleaner.clean_df
3743
3844## How It Flows
3945
40- ``` mermaid
46+ < div class = " mermaid " >
4147flowchart LR
4248 A[ DataFrame] --> B[ Analyze Distribution]
4349 B --> C{Recommend Method}
@@ -46,7 +52,12 @@ flowchart LR
4652 D --> F[ Remove / Winsorize]
4753 E --> F
4854 F --> G[ Report & Plots]
49- ```
55+ </div >
56+
57+ <script src =" https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js " ></script >
58+ <script >
59+ mermaid .initialize ({ startOnLoad: true });
60+ </script >
5061
5162## Navigation
5263
@@ -55,6 +66,43 @@ flowchart LR
5566- [ Statistical Methods] ( statistical-methods.md )
5667- [ API Reference] ( api-reference.md )
5768
69+ ## Key Features
70+
71+ ### 🔬 ** Statistical Testing & Analysis**
72+ - ** Formal Statistical Tests** : Grubbs' test and Dixon's Q-test with p-values
73+ - ** Distribution Analysis** : Automatic normality testing and method recommendations
74+ - ** Method Comparison** : Statistical agreement analysis between detection methods
75+ - ** Publication-Quality Reporting** : P-values, confidence intervals, and effect sizes
76+
77+ ### 📊 ** Detection Methods**
78+ - ** Univariate** : IQR, Z-score, Modified Z-score (MAD-based)
79+ - ** Multivariate** : Mahalanobis distance with chi-square thresholds
80+ - ** Batch Processing** : Multi-column detection with progress tracking
81+ - ** Automatic Selection** : Based on distribution characteristics
82+
83+ ### 🛠️ ** Treatment Options**
84+ - ** Removal** : Statistical validation with significance testing
85+ - ** Winsorizing** : Cap outliers at bounds instead of removal
86+ - ** Transformations** : Box-Cox, logarithmic, square-root with recommendations
87+ - ** Method Chaining** : Fluent API for streamlined workflows
88+
89+ ## Advanced Usage
90+
91+ ``` python
92+ # Formal statistical testing
93+ result = cleaner.grubbs_test(' income' , alpha = 0.05 )
94+ print (f " P-value: { result[' p_value' ]:.6f } " )
95+
96+ # Multivariate outlier detection
97+ outliers = cleaner.detect_outliers_mahalanobis([' income' , ' age' ])
98+
99+ # Method chaining with transformations
100+ cleaned = (cleaner
101+ .transform_boxcox(' income' )
102+ .remove_outliers_modified_zscore(' income' )
103+ .clean_df)
104+ ```
105+
58106## Links
59107
60108- [ GitHub Repository] ( https://github.com/SubaashNair/StatClean )
0 commit comments