Update documentation links and prepare v0.1.2

SubaashNair · SubaashNair · commit 4c441d74da35 · 2025-08-06T20:09:37.000+08:00
- Update setup.py with GitHub Pages documentation links
- Bump version to 0.1.2 for documentation update
- Add comprehensive GitHub Wiki content template
- Link PyPI package to proper documentation URLs
diff --git a/setup.py b/setup.py
@@ -4,18 +4,20 @@
 
 setup(
     name="statclean",
-    version="0.1.0",
+    version="0.1.2",
     author="Subashanan Nair",
     author_email="subaashnair12@gmail.com",
     description="A comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting",
     long_description=long_description,
     long_description_content_type="text/markdown",
     url="https://github.com/SubaashNair/StatClean",
-    project_urls={  # <--- ADD THIS SECTION!
-        "Homepage": "https://github.com/SubaashNair/StatClean",
+    project_urls={
+        "Homepage": "https://subaashnair.github.io/StatClean/",
+        "Documentation": "https://subaashnair.github.io/StatClean/",
         "Source": "https://github.com/SubaashNair/StatClean",
         "Tracker": "https://github.com/SubaashNair/StatClean/issues",
-        "Documentation": "https://github.com/SubaashNair/StatClean#readme",
+        "API Reference": "https://subaashnair.github.io/StatClean/api-reference",
+        "Examples": "https://subaashnair.github.io/StatClean/examples",
     },
     packages=find_packages(),
     classifiers=[
diff --git a/wiki_content.md b/wiki_content.md
@@ -0,0 +1,251 @@
+# GitHub Wiki Content for StatClean
+
+## Home Page (Home.md)
+
+# Welcome to StatClean Wiki
+
+StatClean is a comprehensive statistical data preprocessing and outlier detection library with formal statistical testing and publication-quality reporting.
+
+## Quick Navigation
+
+- [Installation Guide](Installation-Guide)
+- [Quick Start Tutorial](Quick-Start-Tutorial)  
+- [Statistical Methods Guide](Statistical-Methods-Guide)
+- [API Reference](API-Reference)
+- [Advanced Examples](Advanced-Examples)
+- [Performance Tips](Performance-Tips)
+- [Troubleshooting](Troubleshooting)
+- [Contributing](Contributing)
+
+## Key Features
+
+- **Formal Statistical Testing**: Grubbs' test, Dixon's Q-test with p-values
+- **Multivariate Analysis**: Mahalanobis distance outlier detection
+- **Data Transformations**: Box-Cox, logarithmic, square-root transformations
+- **Method Chaining**: Fluent API for streamlined workflows
+- **Publication-Quality Reporting**: Statistical significance testing
+
+## Links
+
+- [GitHub Repository](https://github.com/SubaashNair/StatClean)
+- [PyPI Package](https://pypi.org/project/statclean/)
+- [Documentation](https://subaashnair.github.io/StatClean/)
+
+---
+
+## Installation Guide (Installation-Guide.md)
+
+# Installation Guide
+
+## Quick Install
+
+```bash
+pip install statclean
+```
+
+## Requirements
+
+- Python 3.7+
+- numpy >= 1.19.0
+- pandas >= 1.2.0
+- matplotlib >= 3.3.0
+- seaborn >= 0.11.0
+- scipy >= 1.6.0
+- tqdm >= 4.60.0
+
+## Development Install
+
+```bash
+git clone https://github.com/SubaashNair/StatClean.git
+cd StatClean
+pip install -e .
+```
+
+## Verification
+
+```python
+from statclean import StatClean
+print("Installation successful!")
+```
+
+---
+
+## Quick Start Tutorial (Quick-Start-Tutorial.md)
+
+# Quick Start Tutorial
+
+## Basic Usage
+
+```python
+import pandas as pd
+from statclean import StatClean
+
+# Sample data
+df = pd.DataFrame({
+    'values': [1, 2, 3, 100, 4, 5, 6]  # 100 is an outlier
+})
+
+# Initialize StatClean
+cleaner = StatClean(df)
+
+# Detect outliers
+outliers = cleaner.detect_outliers_zscore('values')
+print(f"Outliers detected: {outliers.sum()}")
+
+# Remove outliers
+cleaner.remove_outliers_zscore('values')
+cleaned_df = cleaner.clean_df
+print(f"Cleaned shape: {cleaned_df.shape}")
+```
+
+## Statistical Testing
+
+```python
+# Formal statistical test
+result = cleaner.grubbs_test('values', alpha=0.05)
+print(f"P-value: {result['p_value']:.6f}")
+print(f"Outlier detected: {result['outlier_detected']}")
+```
+
+## Method Chaining
+
+```python
+# Fluent API
+result = (cleaner
+          .set_thresholds(zscore_threshold=2.5)
+          .winsorize_outliers_iqr('values')
+          .clean_df)
+```
+
+---
+
+## Performance Tips (Performance-Tips.md)
+
+# Performance Tips
+
+## Large Datasets
+
+For datasets with >100k rows:
+
+```python
+# Use batch processing
+cleaner.clean_columns(columns, show_progress=True)
+
+# Cache statistics for repeated operations
+cleaner.add_zscore_columns(columns, cache_stats=True)
+```
+
+## Memory Optimization
+
+```python
+# Process columns individually for memory efficiency
+for col in large_columns:
+    cleaner.remove_outliers_zscore(col)
+    
+# Use in-place operations when possible
+cleaner = StatClean(df, preserve_index=False)
+```
+
+## Multivariate Performance
+
+```python
+# For many variables, consider dimensionality reduction first
+from sklearn.decomposition import PCA
+pca_data = PCA(n_components=5).fit_transform(df)
+```
+
+---
+
+## Troubleshooting (Troubleshooting.md)
+
+# Troubleshooting
+
+## Common Issues
+
+### ImportError
+```bash
+pip install --upgrade statclean
+```
+
+### Memory Issues
+```python
+# Process in chunks
+for chunk in pd.read_csv('large_file.csv', chunksize=10000):
+    cleaner = StatClean(chunk)
+    # Process chunk
+```
+
+### Visualization Problems
+```bash
+# For headless servers
+export MPLBACKEND=Agg
+```
+
+### Singular Matrix Error
+This occurs with perfectly correlated variables in Mahalanobis distance:
+```python
+# Remove highly correlated variables first
+correlation_matrix = df.corr()
+# Remove variables with correlation > 0.95
+```
+
+## Getting Help
+
+- Check [GitHub Issues](https://github.com/SubaashNair/StatClean/issues)
+- Read [Documentation](https://subaashnair.github.io/StatClean/)
+- Review [Examples](https://subaashnair.github.io/StatClean/examples)
+
+---
+
+## Contributing (Contributing.md)
+
+# Contributing to StatClean
+
+## Development Setup
+
+```bash
+git clone https://github.com/SubaashNair/StatClean.git
+cd StatClean
+pip install -e .
+pip install pytest
+```
+
+## Running Tests
+
+```bash
+pytest tests/
+```
+
+## Code Style
+
+- Follow PEP 8
+- Use type hints
+- Add docstrings to all functions
+- No Claude references in commits
+
+## Pull Request Process
+
+1. Fork the repository
+2. Create feature branch: `git checkout -b feature-name`
+3. Make changes with tests
+4. Run test suite: `pytest`
+5. Submit pull request
+
+## Areas for Contribution
+
+- Additional statistical tests
+- Performance optimizations
+- New visualization methods
+- Documentation improvements
+- Bug fixes
+
+---
+
+Instructions for setting up GitHub Wiki:
+
+1. Go to your GitHub repository: https://github.com/SubaashNair/StatClean
+2. Click on "Wiki" tab
+3. Click "Create the first page"
+4. Copy the content above for each page
+5. Create pages with the exact names shown in parentheses
+6. Set "Home" as the main wiki page