|
| 1 | +# DeepEthogram Repository Cleanup Plan |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +This document outlines a comprehensive cleanup plan for the DeepEthogram repository to modernize the codebase, improve maintainability, and enhance user experience. The cleanup is organized into phases to ensure systematic improvement without breaking existing functionality. |
| 6 | + |
| 7 | +**Last Updated**: January 2025 |
| 8 | +**Current Branch**: cleanup (partially implemented) |
| 9 | + |
| 10 | +## Progress Status |
| 11 | + |
| 12 | +### β
Already Completed on Cleanup Branch |
| 13 | +- Migrated from setup.py to pyproject.toml |
| 14 | +- Consolidated dependencies (removed requirements.txt) |
| 15 | +- Simplified installation process |
| 16 | +- Added Docker build and test script |
| 17 | +- Updated CI/CD workflows |
| 18 | +- Added UV package manager support (beta) |
| 19 | + |
| 20 | +## Phase 1: PyTorch Lightning & Installation Fixes (1 week) π¨ |
| 21 | + |
| 22 | +### 1.1 PyTorch Lightning Compatibility (HIGHEST PRIORITY) |
| 23 | +**Status**: β Not Started - **Fixes Issues #163, #145, #158** |
| 24 | +**Note**: Stay on Python 3.7 to avoid PySide complications |
| 25 | +- [ ] Create compatibility layer for Lightning 1.6.5 β 2.x |
| 26 | +- [ ] Option A: Pin to intermediate version (1.9.x) that works with Python 3.7 |
| 27 | +- [ ] Option B: Add compatibility shims: |
| 28 | + - [ ] Detect Lightning version and use appropriate API calls |
| 29 | + - [ ] Wrap trainer instantiation with version checks |
| 30 | + - [ ] Fix `reload_dataloaders_every_epoch` parameter issue |
| 31 | + - [ ] Fix `progress_bar_refresh_rate` parameter issue |
| 32 | + - [ ] Fix `gpus` vs `accelerator` parameter |
| 33 | + - [ ] Fix FPSCallback `dataloader_idx` parameter |
| 34 | +- [ ] Test all training pipelines |
| 35 | +- [ ] Document Lightning version requirements |
| 36 | + |
| 37 | +### 1.2 NumPy Compatibility Fix |
| 38 | +**Status**: β Not Started - **Fixes Issue #155** |
| 39 | +- [ ] Replace all `np.float` with `float` or `np.float64` |
| 40 | +- [ ] Replace all `np.int` with `int` or `np.int64` |
| 41 | +- [ ] Add numpy version constraint compatible with Python 3.7 |
| 42 | +- [ ] Test with numpy 1.21.x (last to support Python 3.7 well) |
| 43 | + |
| 44 | +### 1.3 Hydra/OmegaConf Conflict Resolution |
| 45 | +**Status**: β Not Started - **Fixes Issue #144** |
| 46 | +- [ ] Fix hydra detection in `__init__.py` |
| 47 | +- [ ] Ensure omegaconf version compatibility |
| 48 | +- [ ] Add clear error message if hydra-core is installed |
| 49 | +- [ ] Test installation from clean environment |
| 50 | + |
| 51 | +### 1.4 Installation & Dependency Fixes (Python 3.7 compatible) |
| 52 | +**Status**: β οΈ Build system ready, dependencies not updated |
| 53 | +- [ ] Fix scikit-learn 1.0.2 installation for Python 3.7 |
| 54 | +- [ ] Update dependencies that work with Python 3.7: |
| 55 | + - [ ] pandas to highest version supporting 3.7 (1.3.5) |
| 56 | + - [ ] scikit-learn to 1.0.2 (with proper build deps) |
| 57 | + - [ ] scipy to highest 3.7-compatible version |
| 58 | +- [ ] Create requirements-colab.txt for Colab-specific deps |
| 59 | +- [ ] Test installation on fresh systems |
| 60 | + |
| 61 | +### 1.5 Critical GUI Fixes (without PySide upgrade) |
| 62 | +- [ ] Fix dropdown menu issue (#166) - pretrained weights not selectable |
| 63 | +- [ ] Add debugging for Qt platform issues |
| 64 | +- [ ] Create platform-specific installation guides |
| 65 | +- [ ] Add GUI error recovery mechanisms |
| 66 | + |
| 67 | +## Phase 2: Stabilization and Testing (1-2 weeks) |
| 68 | + |
| 69 | +### 2.1 Test Suite Fixes |
| 70 | +**Status**: β οΈ Docker test script added, but tests may fail |
| 71 | +- [ ] Fix all tests broken by dependency updates |
| 72 | +- [ ] Add compatibility shims for PyTorch Lightning changes |
| 73 | +- [ ] Mock GPU tests for CI/CD without GPU |
| 74 | +- [ ] Ensure Docker tests pass for all images |
| 75 | +- [ ] Add regression tests for fixed issues |
| 76 | + |
| 77 | +### 2.2 Documentation Updates |
| 78 | +**Status**: β οΈ Some docs added, critical gaps remain |
| 79 | +- [ ] Complete CLI documentation (#169, #170) |
| 80 | +- [ ] Fill in empty `model_performance.md` |
| 81 | +- [ ] Create troubleshooting guide for common issues: |
| 82 | + - [ ] Installation failures by OS |
| 83 | + - [ ] GPU detection problems |
| 84 | + - [ ] Qt/GUI issues |
| 85 | + - [ ] Dependency conflicts |
| 86 | +- [ ] Update README with new Python version requirements |
| 87 | + |
| 88 | +### 2.3 Installation Verification |
| 89 | +- [ ] Test installation on fresh systems: |
| 90 | + - [ ] Ubuntu 20.04, 22.04 |
| 91 | + - [ ] Windows 10, 11 |
| 92 | + - [ ] macOS 12, 13, 14 |
| 93 | +- [ ] Verify Colab notebook works (#173) |
| 94 | +- [ ] Test conda environment creation |
| 95 | +- [ ] Verify UV installation method |
| 96 | + |
| 97 | +## Phase 3: Python Version Upgrade (2-3 weeks) |
| 98 | + |
| 99 | +### 3.1 Python 3.8+ Migration Planning |
| 100 | +**Status**: β Deferred until core issues fixed |
| 101 | +**Dependencies**: Requires PySide2 β PySide6 migration |
| 102 | +- [ ] Create detailed migration plan for PySide2 β PySide6 |
| 103 | +- [ ] Identify all Qt-dependent code sections |
| 104 | +- [ ] Plan phased migration approach |
| 105 | +- [ ] Test PySide6 compatibility on all platforms |
| 106 | + |
| 107 | +### 3.2 Python Version Update |
| 108 | +**After PySide6 migration is complete** |
| 109 | +- [ ] Update Python constraint to `>=3.8,<3.12` |
| 110 | +- [ ] Update all Docker base images |
| 111 | +- [ ] Update conda environment |
| 112 | +- [ ] Test on Python 3.8, 3.9, 3.10, 3.11 |
| 113 | + |
| 114 | +### 3.3 Modern Dependency Updates |
| 115 | +**Only after Python 3.8+ is working** |
| 116 | +- [ ] Update to latest compatible versions: |
| 117 | + - [ ] pytorch_lightning to 2.x |
| 118 | + - [ ] pandas to 2.x |
| 119 | + - [ ] numpy to 1.24+ |
| 120 | + - [ ] scikit-learn to 1.3+ |
| 121 | + - [ ] scipy to 1.11+ |
| 122 | + |
| 123 | +## Phase 4: Long-term Improvements (3-4 weeks) |
| 124 | + |
| 125 | +### 4.1 Complete PySide6 Migration |
| 126 | +**Status**: β Planning needed |
| 127 | +- [ ] Create migration plan from PySide2 to PySide6 |
| 128 | +- [ ] Update all Qt imports and API calls |
| 129 | +- [ ] Test on all platforms |
| 130 | +- [ ] Update Docker images with new Qt |
| 131 | +- [ ] Document any breaking changes |
| 132 | + |
| 133 | +### 4.2 Feature Requests Implementation |
| 134 | +- [ ] Batch video selection for inference (#143) |
| 135 | +- [ ] Resume training from checkpoint (#149) |
| 136 | +- [ ] Better error messages for missing weights |
| 137 | +- [ ] Improved model selection UI |
| 138 | +- [ ] Add progress bars for long operations |
| 139 | + |
| 140 | +### 4.3 Code Quality and Refactoring |
| 141 | +- [ ] Address remaining TODO items |
| 142 | +- [ ] Add type hints throughout codebase |
| 143 | +- [ ] Improve error handling |
| 144 | +- [ ] Refactor configuration system (#1168) |
| 145 | +- [ ] Remove redundant parameters (#94) |
| 146 | + |
| 147 | +## Phase 5: Architecture Refactoring (3-4 weeks) |
| 148 | + |
| 149 | +### 5.1 Code Structure Improvements |
| 150 | +- [ ] Separate GUI logic from core functionality |
| 151 | +- [ ] Create clear API boundaries |
| 152 | +- [ ] Implement dependency injection where appropriate |
| 153 | +- [ ] Refactor configuration system for clarity |
| 154 | + |
| 155 | +### 5.2 Model Architecture Updates |
| 156 | +- [ ] Update model implementations to use latest PyTorch features |
| 157 | +- [ ] Implement model registry pattern |
| 158 | +- [ ] Add support for custom model architectures |
| 159 | +- [ ] Create model zoo with pretrained weights |
| 160 | + |
| 161 | +### 5.3 Plugin System |
| 162 | +- [ ] Design plugin architecture for extensions |
| 163 | +- [ ] Create plugin API |
| 164 | +- [ ] Implement example plugins |
| 165 | +- [ ] Document plugin development |
| 166 | + |
| 167 | +## Phase 6: Advanced Features (4-6 weeks) |
| 168 | + |
| 169 | +### 6.1 Workflow Automation |
| 170 | +- [ ] Create CLI for batch processing |
| 171 | +- [ ] Add experiment tracking (MLflow/W&B integration) |
| 172 | +- [ ] Implement automatic hyperparameter tuning |
| 173 | +- [ ] Add continuous learning pipeline |
| 174 | + |
| 175 | +### 6.2 Cloud and Deployment |
| 176 | +- [ ] Create Docker images for different use cases |
| 177 | +- [ ] Add Kubernetes deployment configurations |
| 178 | +- [ ] Implement REST API for remote inference |
| 179 | +- [ ] Create cloud-friendly storage backends |
| 180 | + |
| 181 | +### 6.3 Extended Functionality |
| 182 | +- [ ] Add multi-animal tracking support |
| 183 | +- [ ] Implement real-time inference mode |
| 184 | +- [ ] Add support for additional video formats |
| 185 | +- [ ] Create behavior analysis tools |
| 186 | + |
| 187 | +## Critical Path and Priority Order |
| 188 | + |
| 189 | +### π΄ MUST DO FIRST (Stay on Python 3.7): |
| 190 | +1. **PyTorch Lightning compatibility** - Add shims/version detection |
| 191 | +2. **NumPy deprecations** - Fix np.float/np.int usage |
| 192 | +3. **Installation fixes** - Hydra conflicts, scikit-learn builds |
| 193 | +4. **GUI dropdown fix** - Unblocks workflow |
| 194 | + |
| 195 | +### π‘ THEN FIX (Still Python 3.7): |
| 196 | +1. Colab notebook compatibility |
| 197 | +2. Qt platform fixes (workarounds) |
| 198 | +3. Documentation completion |
| 199 | +4. Testing improvements |
| 200 | + |
| 201 | +### π’ FINALLY UPGRADE (Requires planning): |
| 202 | +1. PySide2 β PySide6 migration |
| 203 | +2. Python 3.8+ support |
| 204 | +3. Modern dependency versions |
| 205 | +4. Performance optimizations |
| 206 | + |
| 207 | +## Implementation Guidelines |
| 208 | + |
| 209 | +### Quick Wins First |
| 210 | +Start with changes that: |
| 211 | +- Have minimal risk |
| 212 | +- Fix the most user-reported issues |
| 213 | +- Can be tested easily |
| 214 | +- Don't require major refactoring |
| 215 | + |
| 216 | +### Version Control Strategy |
| 217 | +1. **Current branch (cleanup)**: Already has build improvements |
| 218 | +2. Create sub-branches for each critical fix |
| 219 | +3. Test each fix independently |
| 220 | +4. Merge incrementally with thorough testing |
| 221 | +5. Tag pre-release versions for testing |
| 222 | + |
| 223 | +### Testing Requirements |
| 224 | +For EACH change: |
| 225 | +1. Run existing test suite |
| 226 | +2. Test on at least 2 OS platforms |
| 227 | +3. Verify GUI still works |
| 228 | +4. Test training pipeline end-to-end |
| 229 | +5. Check Colab compatibility |
| 230 | + |
| 231 | +### Breaking Changes Communication |
| 232 | +1. Create migration guide for Lightning 2.x |
| 233 | +2. Document Python version requirements clearly |
| 234 | +3. Provide compatibility shims where possible |
| 235 | +4. Give users warning before major releases |
| 236 | + |
| 237 | +## Success Metrics |
| 238 | + |
| 239 | +### Immediate Success Criteria (Phase 1) |
| 240 | +- [ ] Installation works on Python 3.8+ |
| 241 | +- [ ] Colab notebook functional |
| 242 | +- [ ] Training runs without Lightning errors |
| 243 | +- [ ] GUI dropdowns work |
| 244 | +- [ ] 90% of open issues addressed or have workarounds |
| 245 | + |
| 246 | +### Overall Project Health |
| 247 | +- [ ] Test coverage > 80% |
| 248 | +- [ ] CI/CD passes on all platforms |
| 249 | +- [ ] Documentation complete for all features |
| 250 | +- [ ] <5 critical bugs reported per month |
| 251 | +- [ ] Installation success rate > 95% |
| 252 | + |
| 253 | +## Risk Mitigation |
| 254 | + |
| 255 | +### Potential Risks |
| 256 | +1. **Breaking Changes**: Maintain compatibility layer |
| 257 | +2. **Performance Regression**: Benchmark before/after |
| 258 | +3. **User Disruption**: Provide migration guides |
| 259 | +4. **Dependency Conflicts**: Test thoroughly |
| 260 | +5. **Data Loss**: Implement backup mechanisms |
| 261 | + |
| 262 | +### Mitigation Strategies |
| 263 | +1. Comprehensive testing at each phase |
| 264 | +2. Gradual rollout with beta testing |
| 265 | +3. Maintain stable branch during development |
| 266 | +4. Document all changes thoroughly |
| 267 | +5. Provide rollback procedures |
| 268 | + |
| 269 | +## Revised Timeline Based on Current Status |
| 270 | + |
| 271 | +### Already Completed (on cleanup branch) |
| 272 | +- β
Build system modernization (setup.py β pyproject.toml) |
| 273 | +- β
Docker improvements |
| 274 | +- β
Installation simplification |
| 275 | + |
| 276 | +### Immediate Actions (Week 1) - Stay on Python 3.7 |
| 277 | +- π΄ Fix PyTorch Lightning compatibility with shims |
| 278 | +- π΄ Fix NumPy deprecations |
| 279 | +- π΄ Fix installation issues (Hydra, scikit-learn) |
| 280 | +- π΄ Fix GUI dropdown bug |
| 281 | + |
| 282 | +### Short Term (Weeks 2-3) - Still Python 3.7 |
| 283 | +- π‘ Stabilize all installations |
| 284 | +- π‘ Fix Colab notebook |
| 285 | +- π‘ Complete documentation |
| 286 | +- π‘ Platform-specific fixes |
| 287 | + |
| 288 | +### Medium Term (Weeks 4-6) - Python upgrade |
| 289 | +- π’ PySide2 β PySide6 migration |
| 290 | +- π’ Python 3.8+ upgrade |
| 291 | +- π’ Modern dependency updates |
| 292 | + |
| 293 | +### Long Term (Weeks 7-10) |
| 294 | +- π΅ Performance optimizations |
| 295 | +- π΅ Feature additions |
| 296 | +- π΅ Architecture improvements |
| 297 | + |
| 298 | +### Total: 2 months for critical fixes, 4 months for full modernization |
| 299 | + |
| 300 | +## Immediate Next Steps |
| 301 | + |
| 302 | +1. **Test current cleanup branch thoroughly** |
| 303 | + ```bash |
| 304 | + ./docker/build_and_test.sh # Already available! |
| 305 | + ``` |
| 306 | + |
| 307 | +2. **Create Lightning compatibility branch (Python 3.7)** |
| 308 | + ```bash |
| 309 | + git checkout -b lightning-compat-py37 |
| 310 | + # Add version detection in base.py |
| 311 | + # Create compatibility shims |
| 312 | + # Test with Lightning 1.6.5 and 1.9.x |
| 313 | + ``` |
| 314 | + |
| 315 | +3. **Fix NumPy and installation issues** |
| 316 | + ```bash |
| 317 | + git checkout -b fix-numpy-install |
| 318 | + # Replace np.float/np.int |
| 319 | + # Fix Hydra detection |
| 320 | + # Test fresh installations |
| 321 | + ``` |
| 322 | + |
| 323 | +4. **Fix GUI dropdown bug** |
| 324 | + ```bash |
| 325 | + git checkout -b fix-gui-dropdown |
| 326 | + # Debug pretrained weight selection |
| 327 | + # Test on multiple platforms |
| 328 | + ``` |
| 329 | + |
| 330 | +5. **Only after above are stable:** |
| 331 | + ```bash |
| 332 | + git checkout -b pyside6-python38 |
| 333 | + # Plan PySide migration first |
| 334 | + # Then upgrade Python version |
| 335 | + ``` |
| 336 | + |
| 337 | +## GitHub Issues Resolution Map |
| 338 | + |
| 339 | +| Issue | Fix Location | Priority | Phase | Python Upgrade Required | |
| 340 | +|-------|-------------|----------|-------|------------------------| |
| 341 | +| #163 (Flow generator) | base.py - Lightning shims | π΄ Critical | 1 | No | |
| 342 | +| #155 (NumPy) | Throughout - np.float | π΄ Critical | 1 | No | |
| 343 | +| #144 (Hydra) | __init__.py | π΄ Critical | 1 | No | |
| 344 | +| #166 (Dropdowns) | gui/main.py | π΄ Critical | 1 | No | |
| 345 | +| #173 (Colab install) | scikit-learn deps | π‘ High | 2 | Partial | |
| 346 | +| #171 (macOS GUI) | Qt workarounds | π‘ High | 2 | No | |
| 347 | +| #164 (Windows Qt) | Platform guide | π‘ High | 2 | No | |
| 348 | +| #172 (Training) | Documentation | π’ Medium | 2 | No | |
| 349 | + |
| 350 | +## Key Strategy Change |
| 351 | + |
| 352 | +### Why Stay on Python 3.7 Initially? |
| 353 | +- **PySide2 β PySide6 is a MAJOR migration** requiring: |
| 354 | + - Rewriting all Qt imports and many API calls |
| 355 | + - Extensive GUI testing on all platforms |
| 356 | + - Potentially breaking changes for users |
| 357 | +- **Most critical issues can be fixed WITHOUT Python upgrade**: |
| 358 | + - PyTorch Lightning: Use compatibility shims |
| 359 | + - NumPy: Simple find/replace of deprecated calls |
| 360 | + - Installation: Fix dependencies within Python 3.7 constraints |
| 361 | + |
| 362 | +### Phased Approach Benefits |
| 363 | +1. **Phase 1**: Fix critical blockers while maintaining stability |
| 364 | +2. **Phase 2**: Stabilize and document workarounds |
| 365 | +3. **Phase 3**: Plan and execute PySide6 + Python upgrade together |
| 366 | +4. **Phase 4**: Modernize with latest dependencies |
| 367 | + |
| 368 | +This approach gets users unblocked FAST while planning the bigger migration carefully. |
| 369 | + |
| 370 | +## Notes |
| 371 | + |
| 372 | +- **Good News**: Build system already modernized on cleanup branch |
| 373 | +- **New Priority**: Fix issues WITHOUT Python upgrade first |
| 374 | +- **Testing**: Use new Docker test script for validation |
| 375 | +- **Communication**: Be clear about phased approach to users |
0 commit comments