You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: wiki_content.md
+73-4Lines changed: 73 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,48 @@
7
7
StatClean is a comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting.
8
8
As of v0.1.3, remover methods return the cleaner instance for chaining; access results via `cleaner.clean_df` and `cleaner.outlier_info`.
# For many variables, consider dimensionality reduction first
154
203
from sklearn.decomposition importPCA
155
204
pca_data = PCA(n_components=5).fit_transform(df)
156
205
```
157
206
207
+
</details>
208
+
209
+
[Back to top](#performance-tips)
210
+
158
211
---
159
212
160
213
## Troubleshooting (Troubleshooting.md)
@@ -183,12 +236,20 @@ export MPLBACKEND=Agg
183
236
```
184
237
185
238
### Mahalanobis Threshold and Stability
239
+
<details>
240
+
<summary><strong>Details</strong></summary>
241
+
186
242
`chi2_threshold` can be percentile (0<val<=1) or absolute chi-square statistic. Covariance inversion uses pseudoinverse when needed; optional shrinkage via scikit-learn's Ledoit–Wolf with `use_shrinkage=True`.
243
+
187
244
```python
188
245
# Remove highly correlated variables first if instability persists
0 commit comments