|
1 | 1 | # Welcome to StatClean |
2 | 2 |
|
3 | | -A comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting. |
| 3 | +Data preprocessing & outlier detection with formal statistical methods and publication-quality reporting. |
| 4 | + |
| 5 | +[](https://pypi.org/project/statclean/) |
| 6 | +[](https://github.com/SubaashNair/StatClean/actions) |
| 7 | +[](../LICENSE) |
| 8 | + |
| 9 | +> Note: Remover methods return `self`. Access cleaned data via `cleaner.clean_df` and details via `cleaner.outlier_info`. |
4 | 10 |
|
5 | 11 | ## Quick Start |
6 | 12 |
|
7 | | -```python |
| 13 | +```bash |
8 | 14 | pip install statclean |
9 | 15 | ``` |
10 | 16 |
|
11 | 17 | ```python |
12 | 18 | from statclean import StatClean |
13 | 19 | import pandas as pd |
14 | 20 |
|
15 | | -# Your data |
16 | 21 | df = pd.DataFrame({'values': [1, 2, 3, 100, 4, 5]}) |
17 | | - |
18 | | -# Initialize StatClean |
19 | 22 | cleaner = StatClean(df) |
20 | | - |
21 | | -# Detect and remove outliers |
22 | 23 | cleaner.remove_outliers_zscore('values') |
23 | | -cleaned_data = cleaner.clean_df |
| 24 | +cleaned_df = cleaner.clean_df |
24 | 25 | ``` |
25 | 26 |
|
26 | | -## Features |
27 | | - |
28 | | -- **Formal Statistical Testing**: Grubbs' test, Dixon's Q-test with p-values |
29 | | -- **Multivariate Analysis**: Mahalanobis distance outlier detection |
30 | | -- **Data Transformations**: Box-Cox, logarithmic, square-root transformations |
31 | | -- **Method Chaining**: Fluent API for streamlined workflows |
32 | | -- **Publication-Quality Reporting**: Statistical significance testing |
| 27 | +## Feature Overview |
| 28 | + |
| 29 | +| Feature | Univariate | Multivariate | Formal Test | |
| 30 | +|---|---:|---:|---:| |
| 31 | +| IQR | ✅ | | | |
| 32 | +| Z-score | ✅ | | | |
| 33 | +| Modified Z-score | ✅ | | | |
| 34 | +| Mahalanobis | | ✅ | | |
| 35 | +| Grubbs | ✅ | | ✅ | |
| 36 | +| Dixon Q | ✅ | | ✅ | |
| 37 | + |
| 38 | +## How It Flows |
| 39 | + |
| 40 | +```mermaid |
| 41 | +flowchart LR |
| 42 | + A[DataFrame] --> B[Analyze Distribution] |
| 43 | + B --> C{Recommend Method} |
| 44 | + C --> D[IQR / Z / Modified Z] |
| 45 | + C --> E[Mahalanobis] |
| 46 | + D --> F[Remove / Winsorize] |
| 47 | + E --> F |
| 48 | + F --> G[Report & Plots] |
| 49 | +``` |
33 | 50 |
|
34 | 51 | ## Navigation |
35 | 52 |
|
36 | | -- [API Reference](api-reference.md) |
37 | | -- [Statistical Methods](statistical-methods.md) |
38 | | -- [Examples](examples.md) |
39 | 53 | - [Installation Guide](installation.md) |
| 54 | +- [Quick Start Examples](examples.md) |
| 55 | +- [Statistical Methods](statistical-methods.md) |
| 56 | +- [API Reference](api-reference.md) |
40 | 57 |
|
41 | 58 | ## Links |
42 | 59 |
|
|
0 commit comments