Skip to content

Commit 4c441d7

Browse files
committed
Update documentation links and prepare v0.1.2
- Update setup.py with GitHub Pages documentation links - Bump version to 0.1.2 for documentation update - Add comprehensive GitHub Wiki content template - Link PyPI package to proper documentation URLs
1 parent 351aba3 commit 4c441d7

File tree

2 files changed

+257
-4
lines changed

2 files changed

+257
-4
lines changed

setup.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,20 @@
44

55
setup(
66
name="statclean",
7-
version="0.1.0",
7+
version="0.1.2",
88
author="Subashanan Nair",
99
author_email="[email protected]",
1010
description="A comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting",
1111
long_description=long_description,
1212
long_description_content_type="text/markdown",
1313
url="https://github.com/SubaashNair/StatClean",
14-
project_urls={ # <--- ADD THIS SECTION!
15-
"Homepage": "https://github.com/SubaashNair/StatClean",
14+
project_urls={
15+
"Homepage": "https://subaashnair.github.io/StatClean/",
16+
"Documentation": "https://subaashnair.github.io/StatClean/",
1617
"Source": "https://github.com/SubaashNair/StatClean",
1718
"Tracker": "https://github.com/SubaashNair/StatClean/issues",
18-
"Documentation": "https://github.com/SubaashNair/StatClean#readme",
19+
"API Reference": "https://subaashnair.github.io/StatClean/api-reference",
20+
"Examples": "https://subaashnair.github.io/StatClean/examples",
1921
},
2022
packages=find_packages(),
2123
classifiers=[

wiki_content.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# GitHub Wiki Content for StatClean
2+
3+
## Home Page (Home.md)
4+
5+
# Welcome to StatClean Wiki
6+
7+
StatClean is a comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting.
8+
9+
## Quick Navigation
10+
11+
- [Installation Guide](Installation-Guide)
12+
- [Quick Start Tutorial](Quick-Start-Tutorial)
13+
- [Statistical Methods Guide](Statistical-Methods-Guide)
14+
- [API Reference](API-Reference)
15+
- [Advanced Examples](Advanced-Examples)
16+
- [Performance Tips](Performance-Tips)
17+
- [Troubleshooting](Troubleshooting)
18+
- [Contributing](Contributing)
19+
20+
## Key Features
21+
22+
- **Formal Statistical Testing**: Grubbs' test, Dixon's Q-test with p-values
23+
- **Multivariate Analysis**: Mahalanobis distance outlier detection
24+
- **Data Transformations**: Box-Cox, logarithmic, square-root transformations
25+
- **Method Chaining**: Fluent API for streamlined workflows
26+
- **Publication-Quality Reporting**: Statistical significance testing
27+
28+
## Links
29+
30+
- [GitHub Repository](https://github.com/SubaashNair/StatClean)
31+
- [PyPI Package](https://pypi.org/project/statclean/)
32+
- [Documentation](https://subaashnair.github.io/StatClean/)
33+
34+
---
35+
36+
## Installation Guide (Installation-Guide.md)
37+
38+
# Installation Guide
39+
40+
## Quick Install
41+
42+
```bash
43+
pip install statclean
44+
```
45+
46+
## Requirements
47+
48+
- Python 3.7+
49+
- numpy >= 1.19.0
50+
- pandas >= 1.2.0
51+
- matplotlib >= 3.3.0
52+
- seaborn >= 0.11.0
53+
- scipy >= 1.6.0
54+
- tqdm >= 4.60.0
55+
56+
## Development Install
57+
58+
```bash
59+
git clone https://github.com/SubaashNair/StatClean.git
60+
cd StatClean
61+
pip install -e .
62+
```
63+
64+
## Verification
65+
66+
```python
67+
from statclean import StatClean
68+
print("Installation successful!")
69+
```
70+
71+
---
72+
73+
## Quick Start Tutorial (Quick-Start-Tutorial.md)
74+
75+
# Quick Start Tutorial
76+
77+
## Basic Usage
78+
79+
```python
80+
import pandas as pd
81+
from statclean import StatClean
82+
83+
# Sample data
84+
df = pd.DataFrame({
85+
'values': [1, 2, 3, 100, 4, 5, 6] # 100 is an outlier
86+
})
87+
88+
# Initialize StatClean
89+
cleaner = StatClean(df)
90+
91+
# Detect outliers
92+
outliers = cleaner.detect_outliers_zscore('values')
93+
print(f"Outliers detected: {outliers.sum()}")
94+
95+
# Remove outliers
96+
cleaner.remove_outliers_zscore('values')
97+
cleaned_df = cleaner.clean_df
98+
print(f"Cleaned shape: {cleaned_df.shape}")
99+
```
100+
101+
## Statistical Testing
102+
103+
```python
104+
# Formal statistical test
105+
result = cleaner.grubbs_test('values', alpha=0.05)
106+
print(f"P-value: {result['p_value']:.6f}")
107+
print(f"Outlier detected: {result['outlier_detected']}")
108+
```
109+
110+
## Method Chaining
111+
112+
```python
113+
# Fluent API
114+
result = (cleaner
115+
.set_thresholds(zscore_threshold=2.5)
116+
.winsorize_outliers_iqr('values')
117+
.clean_df)
118+
```
119+
120+
---
121+
122+
## Performance Tips (Performance-Tips.md)
123+
124+
# Performance Tips
125+
126+
## Large Datasets
127+
128+
For datasets with >100k rows:
129+
130+
```python
131+
# Use batch processing
132+
cleaner.clean_columns(columns, show_progress=True)
133+
134+
# Cache statistics for repeated operations
135+
cleaner.add_zscore_columns(columns, cache_stats=True)
136+
```
137+
138+
## Memory Optimization
139+
140+
```python
141+
# Process columns individually for memory efficiency
142+
for col in large_columns:
143+
cleaner.remove_outliers_zscore(col)
144+
145+
# Use in-place operations when possible
146+
cleaner = StatClean(df, preserve_index=False)
147+
```
148+
149+
## Multivariate Performance
150+
151+
```python
152+
# For many variables, consider dimensionality reduction first
153+
from sklearn.decomposition import PCA
154+
pca_data = PCA(n_components=5).fit_transform(df)
155+
```
156+
157+
---
158+
159+
## Troubleshooting (Troubleshooting.md)
160+
161+
# Troubleshooting
162+
163+
## Common Issues
164+
165+
### ImportError
166+
```bash
167+
pip install --upgrade statclean
168+
```
169+
170+
### Memory Issues
171+
```python
172+
# Process in chunks
173+
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
174+
cleaner = StatClean(chunk)
175+
# Process chunk
176+
```
177+
178+
### Visualization Problems
179+
```bash
180+
# For headless servers
181+
export MPLBACKEND=Agg
182+
```
183+
184+
### Singular Matrix Error
185+
This occurs with perfectly correlated variables in Mahalanobis distance:
186+
```python
187+
# Remove highly correlated variables first
188+
correlation_matrix = df.corr()
189+
# Remove variables with correlation > 0.95
190+
```
191+
192+
## Getting Help
193+
194+
- Check [GitHub Issues](https://github.com/SubaashNair/StatClean/issues)
195+
- Read [Documentation](https://subaashnair.github.io/StatClean/)
196+
- Review [Examples](https://subaashnair.github.io/StatClean/examples)
197+
198+
---
199+
200+
## Contributing (Contributing.md)
201+
202+
# Contributing to StatClean
203+
204+
## Development Setup
205+
206+
```bash
207+
git clone https://github.com/SubaashNair/StatClean.git
208+
cd StatClean
209+
pip install -e .
210+
pip install pytest
211+
```
212+
213+
## Running Tests
214+
215+
```bash
216+
pytest tests/
217+
```
218+
219+
## Code Style
220+
221+
- Follow PEP 8
222+
- Use type hints
223+
- Add docstrings to all functions
224+
- No Claude references in commits
225+
226+
## Pull Request Process
227+
228+
1. Fork the repository
229+
2. Create feature branch: `git checkout -b feature-name`
230+
3. Make changes with tests
231+
4. Run test suite: `pytest`
232+
5. Submit pull request
233+
234+
## Areas for Contribution
235+
236+
- Additional statistical tests
237+
- Performance optimizations
238+
- New visualization methods
239+
- Documentation improvements
240+
- Bug fixes
241+
242+
---
243+
244+
Instructions for setting up GitHub Wiki:
245+
246+
1. Go to your GitHub repository: https://github.com/SubaashNair/StatClean
247+
2. Click on "Wiki" tab
248+
3. Click "Create the first page"
249+
4. Copy the content above for each page
250+
5. Create pages with the exact names shown in parentheses
251+
6. Set "Home" as the main wiki page

0 commit comments

Comments
 (0)