Skip to content

Commit 0d7a619

Browse files
committed
docs(wiki): add badges, quick links grid, feature matrix, mermaid flow, collapsibles, and back-to-top anchors
1 parent 2a419f5 commit 0d7a619

File tree

1 file changed

+73
-4
lines changed

1 file changed

+73
-4
lines changed

wiki_content.md

Lines changed: 73 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,48 @@
77
StatClean is a comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting.
88
As of v0.1.3, remover methods return the cleaner instance for chaining; access results via `cleaner.clean_df` and `cleaner.outlier_info`.
99

10+
[![PyPI](https://img.shields.io/pypi/v/statclean.svg)](https://pypi.org/project/statclean/)
11+
[![Build](https://github.com/SubaashNair/StatClean/actions/workflows/publish.yml/badge.svg)](https://github.com/SubaashNair/StatClean/actions)
12+
[![Docs](https://img.shields.io/badge/docs-GitHub%20Pages-blue)](https://subaashnair.github.io/StatClean/)
13+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
14+
15+
> Note: Remover methods return `self`. Access cleaned data via `cleaner.clean_df` and details via `cleaner.outlier_info`.
16+
17+
### Quick Links
18+
19+
| Getting Started | Learn |
20+
|---|---|
21+
| [Installation](Installation-Guide) | [Statistical Methods](Statistical-Methods-Guide) |
22+
| [Quick Start](Quick-Start-Tutorial) | [API Reference](API-Reference) |
23+
| [Examples](Advanced-Examples) | [Performance Tips](Performance-Tips) |
24+
| [Troubleshooting](Troubleshooting) | [Contributing](Contributing) |
25+
26+
### Feature Overview
27+
28+
| Feature | Univariate | Multivariate | Formal Test |
29+
|---|---:|---:|---:|
30+
| IQR || | |
31+
| Z-score || | |
32+
| Modified Z-score || | |
33+
| Mahalanobis | || |
34+
| Grubbs || ||
35+
| Dixon Q || ||
36+
37+
### How It Flows
38+
39+
```mermaid
40+
flowchart LR
41+
A[DataFrame] --> B[Analyze Distribution]
42+
B --> C{Recommend Method}
43+
C --> D[IQR / Z / Modified Z]
44+
C --> E[Mahalanobis]
45+
D --> F[Remove / Winsorize]
46+
E --> F
47+
F --> G[Report & Plots]
48+
```
49+
50+
[Back to top](#welcome-to-statclean)
51+
1052
## Quick Navigation
1153

1254
- [Installation Guide](Installation-Guide)
@@ -124,7 +166,8 @@ result = (cleaner
124166

125167
# Performance Tips
126168

127-
## Large Datasets
169+
<details>
170+
<summary><strong>Large Datasets</strong></summary>
128171

129172
For datasets with >100k rows:
130173

@@ -136,7 +179,10 @@ cleaner.clean_columns(columns, show_progress=True)
136179
cleaner.add_zscore_columns(columns, cache_stats=True)
137180
```
138181

139-
## Memory Optimization
182+
</details>
183+
184+
<details>
185+
<summary><strong>Memory Optimization</strong></summary>
140186

141187
```python
142188
# Process columns individually for memory efficiency
@@ -147,14 +193,21 @@ for col in large_columns:
147193
cleaner = StatClean(df, preserve_index=False)
148194
```
149195

150-
## Multivariate Performance
196+
</details>
197+
198+
<details>
199+
<summary><strong>Multivariate Performance</strong></summary>
151200

152201
```python
153202
# For many variables, consider dimensionality reduction first
154203
from sklearn.decomposition import PCA
155204
pca_data = PCA(n_components=5).fit_transform(df)
156205
```
157206

207+
</details>
208+
209+
[Back to top](#performance-tips)
210+
158211
---
159212

160213
## Troubleshooting (Troubleshooting.md)
@@ -183,12 +236,20 @@ export MPLBACKEND=Agg
183236
```
184237

185238
### Mahalanobis Threshold and Stability
239+
<details>
240+
<summary><strong>Details</strong></summary>
241+
186242
`chi2_threshold` can be percentile (0<val<=1) or absolute chi-square statistic. Covariance inversion uses pseudoinverse when needed; optional shrinkage via scikit-learn's Ledoit–Wolf with `use_shrinkage=True`.
243+
187244
```python
188245
# Remove highly correlated variables first if instability persists
189246
correlation_matrix = df.corr()
190247
```
191248

249+
</details>
250+
251+
[Back to top](#troubleshooting)
252+
192253
## Getting Help
193254

194255
- Check [GitHub Issues](https://github.com/SubaashNair/StatClean/issues)
@@ -278,6 +339,10 @@ Instructions for setting up GitHub Wiki:
278339

279340
Best practices: drop NaNs before tests where needed; sample large data for Shapiro.
280341

342+
> Warning: Dixon’s Q-test is recommended only for small sample sizes (n < 30).
343+
344+
[Back to top](#statistical-methods-guide)
345+
281346
---
282347

283348
## API Reference (API-Reference.md)
@@ -327,6 +392,8 @@ Notes:
327392
- Remover methods return `self` for chaining; access data via `cleaner.clean_df`.
328393
- Mahalanobis supports percentile thresholds and shrinkage covariance.
329394

395+
[Back to top](#api-reference)
396+
330397
---
331398

332399
## Advanced Examples (Advanced-Examples.md)
@@ -362,4 +429,6 @@ figs = cleaner.plot_outlier_analysis(features)
362429
outliers = cleaner.detect_outliers_modified_zscore('PRICE')
363430
cleaner.remove_outliers_modified_zscore('PRICE')
364431
cleaner.visualize_outliers('PRICE')
365-
```
432+
```
433+
434+
[Back to top](#advanced-examples)

0 commit comments

Comments
 (0)