Skip to content

Commit d340108

Browse files
committed
docs
1 parent 7477964 commit d340108

File tree

2 files changed

+538
-0
lines changed

2 files changed

+538
-0
lines changed

β€Ždocs/cleanup_plan.mdβ€Ž

Lines changed: 375 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,375 @@
1+
# DeepEthogram Repository Cleanup Plan
2+
3+
## Executive Summary
4+
5+
This document outlines a comprehensive cleanup plan for the DeepEthogram repository to modernize the codebase, improve maintainability, and enhance user experience. The cleanup is organized into phases to ensure systematic improvement without breaking existing functionality.
6+
7+
**Last Updated**: January 2025
8+
**Current Branch**: cleanup (partially implemented)
9+
10+
## Progress Status
11+
12+
### βœ… Already Completed on Cleanup Branch
13+
- Migrated from setup.py to pyproject.toml
14+
- Consolidated dependencies (removed requirements.txt)
15+
- Simplified installation process
16+
- Added Docker build and test script
17+
- Updated CI/CD workflows
18+
- Added UV package manager support (beta)
19+
20+
## Phase 1: PyTorch Lightning & Installation Fixes (1 week) 🚨
21+
22+
### 1.1 PyTorch Lightning Compatibility (HIGHEST PRIORITY)
23+
**Status**: ❌ Not Started - **Fixes Issues #163, #145, #158**
24+
**Note**: Stay on Python 3.7 to avoid PySide complications
25+
- [ ] Create compatibility layer for Lightning 1.6.5 β†’ 2.x
26+
- [ ] Option A: Pin to intermediate version (1.9.x) that works with Python 3.7
27+
- [ ] Option B: Add compatibility shims:
28+
- [ ] Detect Lightning version and use appropriate API calls
29+
- [ ] Wrap trainer instantiation with version checks
30+
- [ ] Fix `reload_dataloaders_every_epoch` parameter issue
31+
- [ ] Fix `progress_bar_refresh_rate` parameter issue
32+
- [ ] Fix `gpus` vs `accelerator` parameter
33+
- [ ] Fix FPSCallback `dataloader_idx` parameter
34+
- [ ] Test all training pipelines
35+
- [ ] Document Lightning version requirements
36+
37+
### 1.2 NumPy Compatibility Fix
38+
**Status**: ❌ Not Started - **Fixes Issue #155**
39+
- [ ] Replace all `np.float` with `float` or `np.float64`
40+
- [ ] Replace all `np.int` with `int` or `np.int64`
41+
- [ ] Add numpy version constraint compatible with Python 3.7
42+
- [ ] Test with numpy 1.21.x (last to support Python 3.7 well)
43+
44+
### 1.3 Hydra/OmegaConf Conflict Resolution
45+
**Status**: ❌ Not Started - **Fixes Issue #144**
46+
- [ ] Fix hydra detection in `__init__.py`
47+
- [ ] Ensure omegaconf version compatibility
48+
- [ ] Add clear error message if hydra-core is installed
49+
- [ ] Test installation from clean environment
50+
51+
### 1.4 Installation & Dependency Fixes (Python 3.7 compatible)
52+
**Status**: ⚠️ Build system ready, dependencies not updated
53+
- [ ] Fix scikit-learn 1.0.2 installation for Python 3.7
54+
- [ ] Update dependencies that work with Python 3.7:
55+
- [ ] pandas to highest version supporting 3.7 (1.3.5)
56+
- [ ] scikit-learn to 1.0.2 (with proper build deps)
57+
- [ ] scipy to highest 3.7-compatible version
58+
- [ ] Create requirements-colab.txt for Colab-specific deps
59+
- [ ] Test installation on fresh systems
60+
61+
### 1.5 Critical GUI Fixes (without PySide upgrade)
62+
- [ ] Fix dropdown menu issue (#166) - pretrained weights not selectable
63+
- [ ] Add debugging for Qt platform issues
64+
- [ ] Create platform-specific installation guides
65+
- [ ] Add GUI error recovery mechanisms
66+
67+
## Phase 2: Stabilization and Testing (1-2 weeks)
68+
69+
### 2.1 Test Suite Fixes
70+
**Status**: ⚠️ Docker test script added, but tests may fail
71+
- [ ] Fix all tests broken by dependency updates
72+
- [ ] Add compatibility shims for PyTorch Lightning changes
73+
- [ ] Mock GPU tests for CI/CD without GPU
74+
- [ ] Ensure Docker tests pass for all images
75+
- [ ] Add regression tests for fixed issues
76+
77+
### 2.2 Documentation Updates
78+
**Status**: ⚠️ Some docs added, critical gaps remain
79+
- [ ] Complete CLI documentation (#169, #170)
80+
- [ ] Fill in empty `model_performance.md`
81+
- [ ] Create troubleshooting guide for common issues:
82+
- [ ] Installation failures by OS
83+
- [ ] GPU detection problems
84+
- [ ] Qt/GUI issues
85+
- [ ] Dependency conflicts
86+
- [ ] Update README with new Python version requirements
87+
88+
### 2.3 Installation Verification
89+
- [ ] Test installation on fresh systems:
90+
- [ ] Ubuntu 20.04, 22.04
91+
- [ ] Windows 10, 11
92+
- [ ] macOS 12, 13, 14
93+
- [ ] Verify Colab notebook works (#173)
94+
- [ ] Test conda environment creation
95+
- [ ] Verify UV installation method
96+
97+
## Phase 3: Python Version Upgrade (2-3 weeks)
98+
99+
### 3.1 Python 3.8+ Migration Planning
100+
**Status**: ❌ Deferred until core issues fixed
101+
**Dependencies**: Requires PySide2 β†’ PySide6 migration
102+
- [ ] Create detailed migration plan for PySide2 β†’ PySide6
103+
- [ ] Identify all Qt-dependent code sections
104+
- [ ] Plan phased migration approach
105+
- [ ] Test PySide6 compatibility on all platforms
106+
107+
### 3.2 Python Version Update
108+
**After PySide6 migration is complete**
109+
- [ ] Update Python constraint to `>=3.8,<3.12`
110+
- [ ] Update all Docker base images
111+
- [ ] Update conda environment
112+
- [ ] Test on Python 3.8, 3.9, 3.10, 3.11
113+
114+
### 3.3 Modern Dependency Updates
115+
**Only after Python 3.8+ is working**
116+
- [ ] Update to latest compatible versions:
117+
- [ ] pytorch_lightning to 2.x
118+
- [ ] pandas to 2.x
119+
- [ ] numpy to 1.24+
120+
- [ ] scikit-learn to 1.3+
121+
- [ ] scipy to 1.11+
122+
123+
## Phase 4: Long-term Improvements (3-4 weeks)
124+
125+
### 4.1 Complete PySide6 Migration
126+
**Status**: ❌ Planning needed
127+
- [ ] Create migration plan from PySide2 to PySide6
128+
- [ ] Update all Qt imports and API calls
129+
- [ ] Test on all platforms
130+
- [ ] Update Docker images with new Qt
131+
- [ ] Document any breaking changes
132+
133+
### 4.2 Feature Requests Implementation
134+
- [ ] Batch video selection for inference (#143)
135+
- [ ] Resume training from checkpoint (#149)
136+
- [ ] Better error messages for missing weights
137+
- [ ] Improved model selection UI
138+
- [ ] Add progress bars for long operations
139+
140+
### 4.3 Code Quality and Refactoring
141+
- [ ] Address remaining TODO items
142+
- [ ] Add type hints throughout codebase
143+
- [ ] Improve error handling
144+
- [ ] Refactor configuration system (#1168)
145+
- [ ] Remove redundant parameters (#94)
146+
147+
## Phase 5: Architecture Refactoring (3-4 weeks)
148+
149+
### 5.1 Code Structure Improvements
150+
- [ ] Separate GUI logic from core functionality
151+
- [ ] Create clear API boundaries
152+
- [ ] Implement dependency injection where appropriate
153+
- [ ] Refactor configuration system for clarity
154+
155+
### 5.2 Model Architecture Updates
156+
- [ ] Update model implementations to use latest PyTorch features
157+
- [ ] Implement model registry pattern
158+
- [ ] Add support for custom model architectures
159+
- [ ] Create model zoo with pretrained weights
160+
161+
### 5.3 Plugin System
162+
- [ ] Design plugin architecture for extensions
163+
- [ ] Create plugin API
164+
- [ ] Implement example plugins
165+
- [ ] Document plugin development
166+
167+
## Phase 6: Advanced Features (4-6 weeks)
168+
169+
### 6.1 Workflow Automation
170+
- [ ] Create CLI for batch processing
171+
- [ ] Add experiment tracking (MLflow/W&B integration)
172+
- [ ] Implement automatic hyperparameter tuning
173+
- [ ] Add continuous learning pipeline
174+
175+
### 6.2 Cloud and Deployment
176+
- [ ] Create Docker images for different use cases
177+
- [ ] Add Kubernetes deployment configurations
178+
- [ ] Implement REST API for remote inference
179+
- [ ] Create cloud-friendly storage backends
180+
181+
### 6.3 Extended Functionality
182+
- [ ] Add multi-animal tracking support
183+
- [ ] Implement real-time inference mode
184+
- [ ] Add support for additional video formats
185+
- [ ] Create behavior analysis tools
186+
187+
## Critical Path and Priority Order
188+
189+
### πŸ”΄ MUST DO FIRST (Stay on Python 3.7):
190+
1. **PyTorch Lightning compatibility** - Add shims/version detection
191+
2. **NumPy deprecations** - Fix np.float/np.int usage
192+
3. **Installation fixes** - Hydra conflicts, scikit-learn builds
193+
4. **GUI dropdown fix** - Unblocks workflow
194+
195+
### 🟑 THEN FIX (Still Python 3.7):
196+
1. Colab notebook compatibility
197+
2. Qt platform fixes (workarounds)
198+
3. Documentation completion
199+
4. Testing improvements
200+
201+
### 🟒 FINALLY UPGRADE (Requires planning):
202+
1. PySide2 β†’ PySide6 migration
203+
2. Python 3.8+ support
204+
3. Modern dependency versions
205+
4. Performance optimizations
206+
207+
## Implementation Guidelines
208+
209+
### Quick Wins First
210+
Start with changes that:
211+
- Have minimal risk
212+
- Fix the most user-reported issues
213+
- Can be tested easily
214+
- Don't require major refactoring
215+
216+
### Version Control Strategy
217+
1. **Current branch (cleanup)**: Already has build improvements
218+
2. Create sub-branches for each critical fix
219+
3. Test each fix independently
220+
4. Merge incrementally with thorough testing
221+
5. Tag pre-release versions for testing
222+
223+
### Testing Requirements
224+
For EACH change:
225+
1. Run existing test suite
226+
2. Test on at least 2 OS platforms
227+
3. Verify GUI still works
228+
4. Test training pipeline end-to-end
229+
5. Check Colab compatibility
230+
231+
### Breaking Changes Communication
232+
1. Create migration guide for Lightning 2.x
233+
2. Document Python version requirements clearly
234+
3. Provide compatibility shims where possible
235+
4. Give users warning before major releases
236+
237+
## Success Metrics
238+
239+
### Immediate Success Criteria (Phase 1)
240+
- [ ] Installation works on Python 3.8+
241+
- [ ] Colab notebook functional
242+
- [ ] Training runs without Lightning errors
243+
- [ ] GUI dropdowns work
244+
- [ ] 90% of open issues addressed or have workarounds
245+
246+
### Overall Project Health
247+
- [ ] Test coverage > 80%
248+
- [ ] CI/CD passes on all platforms
249+
- [ ] Documentation complete for all features
250+
- [ ] <5 critical bugs reported per month
251+
- [ ] Installation success rate > 95%
252+
253+
## Risk Mitigation
254+
255+
### Potential Risks
256+
1. **Breaking Changes**: Maintain compatibility layer
257+
2. **Performance Regression**: Benchmark before/after
258+
3. **User Disruption**: Provide migration guides
259+
4. **Dependency Conflicts**: Test thoroughly
260+
5. **Data Loss**: Implement backup mechanisms
261+
262+
### Mitigation Strategies
263+
1. Comprehensive testing at each phase
264+
2. Gradual rollout with beta testing
265+
3. Maintain stable branch during development
266+
4. Document all changes thoroughly
267+
5. Provide rollback procedures
268+
269+
## Revised Timeline Based on Current Status
270+
271+
### Already Completed (on cleanup branch)
272+
- βœ… Build system modernization (setup.py β†’ pyproject.toml)
273+
- βœ… Docker improvements
274+
- βœ… Installation simplification
275+
276+
### Immediate Actions (Week 1) - Stay on Python 3.7
277+
- πŸ”΄ Fix PyTorch Lightning compatibility with shims
278+
- πŸ”΄ Fix NumPy deprecations
279+
- πŸ”΄ Fix installation issues (Hydra, scikit-learn)
280+
- πŸ”΄ Fix GUI dropdown bug
281+
282+
### Short Term (Weeks 2-3) - Still Python 3.7
283+
- 🟑 Stabilize all installations
284+
- 🟑 Fix Colab notebook
285+
- 🟑 Complete documentation
286+
- 🟑 Platform-specific fixes
287+
288+
### Medium Term (Weeks 4-6) - Python upgrade
289+
- 🟒 PySide2 β†’ PySide6 migration
290+
- 🟒 Python 3.8+ upgrade
291+
- 🟒 Modern dependency updates
292+
293+
### Long Term (Weeks 7-10)
294+
- πŸ”΅ Performance optimizations
295+
- πŸ”΅ Feature additions
296+
- πŸ”΅ Architecture improvements
297+
298+
### Total: 2 months for critical fixes, 4 months for full modernization
299+
300+
## Immediate Next Steps
301+
302+
1. **Test current cleanup branch thoroughly**
303+
```bash
304+
./docker/build_and_test.sh # Already available!
305+
```
306+
307+
2. **Create Lightning compatibility branch (Python 3.7)**
308+
```bash
309+
git checkout -b lightning-compat-py37
310+
# Add version detection in base.py
311+
# Create compatibility shims
312+
# Test with Lightning 1.6.5 and 1.9.x
313+
```
314+
315+
3. **Fix NumPy and installation issues**
316+
```bash
317+
git checkout -b fix-numpy-install
318+
# Replace np.float/np.int
319+
# Fix Hydra detection
320+
# Test fresh installations
321+
```
322+
323+
4. **Fix GUI dropdown bug**
324+
```bash
325+
git checkout -b fix-gui-dropdown
326+
# Debug pretrained weight selection
327+
# Test on multiple platforms
328+
```
329+
330+
5. **Only after above are stable:**
331+
```bash
332+
git checkout -b pyside6-python38
333+
# Plan PySide migration first
334+
# Then upgrade Python version
335+
```
336+
337+
## GitHub Issues Resolution Map
338+
339+
| Issue | Fix Location | Priority | Phase | Python Upgrade Required |
340+
|-------|-------------|----------|-------|------------------------|
341+
| #163 (Flow generator) | base.py - Lightning shims | πŸ”΄ Critical | 1 | No |
342+
| #155 (NumPy) | Throughout - np.float | πŸ”΄ Critical | 1 | No |
343+
| #144 (Hydra) | __init__.py | πŸ”΄ Critical | 1 | No |
344+
| #166 (Dropdowns) | gui/main.py | πŸ”΄ Critical | 1 | No |
345+
| #173 (Colab install) | scikit-learn deps | 🟑 High | 2 | Partial |
346+
| #171 (macOS GUI) | Qt workarounds | 🟑 High | 2 | No |
347+
| #164 (Windows Qt) | Platform guide | 🟑 High | 2 | No |
348+
| #172 (Training) | Documentation | 🟒 Medium | 2 | No |
349+
350+
## Key Strategy Change
351+
352+
### Why Stay on Python 3.7 Initially?
353+
- **PySide2 β†’ PySide6 is a MAJOR migration** requiring:
354+
- Rewriting all Qt imports and many API calls
355+
- Extensive GUI testing on all platforms
356+
- Potentially breaking changes for users
357+
- **Most critical issues can be fixed WITHOUT Python upgrade**:
358+
- PyTorch Lightning: Use compatibility shims
359+
- NumPy: Simple find/replace of deprecated calls
360+
- Installation: Fix dependencies within Python 3.7 constraints
361+
362+
### Phased Approach Benefits
363+
1. **Phase 1**: Fix critical blockers while maintaining stability
364+
2. **Phase 2**: Stabilize and document workarounds
365+
3. **Phase 3**: Plan and execute PySide6 + Python upgrade together
366+
4. **Phase 4**: Modernize with latest dependencies
367+
368+
This approach gets users unblocked FAST while planning the bigger migration carefully.
369+
370+
## Notes
371+
372+
- **Good News**: Build system already modernized on cleanup branch
373+
- **New Priority**: Fix issues WITHOUT Python upgrade first
374+
- **Testing**: Use new Docker test script for validation
375+
- **Communication**: Be clear about phased approach to users

0 commit comments

Comments
Β (0)