Merged
Conversation
- Add Jupyter notebook for interactive actinemys analysis - Add holdout cross-validation example script - Add parallel processing example for multi-sample analysis These examples demonstrate various analysis workflows using the actinemys turtle dataset, including k-fold cross-validation and window-based genomic analysis.
Fixed critical bug where parallel k-fold implementation was only using half the genetic data due to incorrect genotype array serialization. Key changes: - Changed genotype serialization from to_n_alt() to values in parallel_analysis.py - to_n_alt() converts 3D GenotypeArray to 2D, losing critical information - Now saves full genotype.values to preserve 3D structure (SNPs × samples × 2) - Fixed genotype reconstruction in Ray worker to handle full 3D array - Added critical fixes for sample ordering consistency: - Store _sample_data_df in worker config - Set locator.samples before training - Fixed k-fold seed consistency in analysis.py for reproducible splits Results: - Parallel predictions now match sequential (spread ratio improved from 0.56 to 1.02) - Models train with full genetic data instead of half - ~6x speedup on 4 GPUs with accurate results This resolves the issue where parallel k-fold was producing poor predictions due to models being trained on incomplete genetic data.
- Add holdout_sample_ids parameter to all holdout methods - run_holdouts() and parallel_holdouts() - run_windows_holdouts() and parallel_windows_holdouts() - Allow users to specify holdout samples by ID instead of index - More intuitive and reproducible than numerical indices - Supports both single list (same for all reps) and list of lists - Clear error messages for missing sample IDs - Fix numpy array compatibility issue - Handle both list and numpy array sample inputs - Convert to list before using index() method - Ensures compatibility with all data loading methods - Add comprehensive documentation and demo script - example/holdout_sample_ids_demo.py shows usage patterns - SAMPLE_ID_IMPLEMENTATION_SUMMARY.md documents changes This makes holdout analysis more user-friendly by allowing direct specification of sample names rather than requiring users to figure out array indices.
Feat/shared memory multi gpu
Fix major performance bottlenecks in tf.data pipeline that caused 92s of overhead from process forking and excessive parallelization. Performance issues identified from profiling: - 256 fork() calls taking 92.1s (39% of total time) - 3000 h5py dataset recreations taking 8.8s - Excessive parallelization overhead from tf.data.AUTOTUNE Fixes implemented: 1. Set fixed thread pool size (4) instead of 0 to prevent process forking 2. Disable map_parallelization to reduce overhead 3. Use fixed num_parallel_calls (4) instead of tf.data.AUTOTUNE 4. Limit inter-op parallelism to 1 to reduce thread contention These changes maintain functionality while eliminating the expensive fork() calls that dominated execution time. Expected performance improvement: 2-3x faster training with ~90% reduction in forking overhead. Changes in locator/data/tf_dataset.py: - options.threading.private_threadpool_size: 0 → 4 - options.experimental_optimization.map_parallelization: True → False - options.threading.max_intra_op_parallelism: (new) → 1 - All map() num_parallel_calls: tf.data.AUTOTUNE → 4
Add holdout_no_intermediate_saves option to skip ModelCheckpoint during train_holdout, reducing file I/O overhead for k-fold cross-validation workflows. When enabled, this option: - Skips ModelCheckpoint callback during training - Saves model weights only once at the end - Reduces HDF5 operations from ~100+ to just 1 - Maintains existing checkpoint behavior by default Performance improvement from profiling: - HDF5 I/O overhead: 13.7s → ~2s expected - Particularly beneficial for k-fold CV with many folds To enable: config['holdout_no_intermediate_saves'] = True This complements the tf.data pipeline optimization (commit 4c6a300) for significant performance gains in cross-validation workflows.
Add TensorFlow threading configuration to prevent excessive process forking during k-fold cross-validation and other repeated training workflows. Performance improvements: - Process forking: 126.3s → 5.9s (95% reduction) - Total k-fold CV time: 257.9s → 145.7s (43% improvement) Changes: - Add _configure_tensorflow_optimization() method to Locator - Set inter-op threads to 1 to prevent forking - Keep intra-op threads at 4 for parallelism within ops - Add optimize_tf_parallelism config option (default: True) - Disable tf.data experimental slack to reduce overhead This complements previous optimizations: - tf.data pipeline optimization (commit 4c6a300) - HDF5 I/O reduction (commit 5190ec1) Together, these optimizations provide significant performance gains for cross-validation workflows without affecting accuracy.
Add two new config options to control verbosity of common operations that can clutter output during repeated runs like k-fold CV. New config options: - verbose_splits (default: False): Show train/val/test/holdout split sizes - verbose_batch_size (default: False): Show batch size optimization details When verbose_splits is enabled, displays: - Number and percentage of samples in each split - Total samples and SNPs being used - Works for both train() and train_holdout() methods When verbose_batch_size is enabled, displays: - GPU memory estimation details - Batch size optimization process - Final optimized batch size - Only relevant when gpu_batch_size='auto' Benefits: - Cleaner output for k-fold CV and other repeated training workflows - Still provides detailed info when debugging or first running - Backward compatible - defaults preserve existing quiet behavior Changes: - locator/core.py: Add default config values - locator/training.py: Add split reporting and batch size verbosity control - locator/gpu_optimizer.py: Add verbose parameter to get_optimal_batch_size() - tests/test_verbosity_control.py: Comprehensive test suite
Reduce excessive array copying and improve vectorization to address performance bottlenecks identified in profiling. Performance improvements from profiling: - numpy.array operations: 34.4s → ~10s (70% reduction expected) - array.copy operations: 25.2s → ~5s (80% reduction expected) - Overall train_holdout: 129s → ~50s (60% improvement) Optimizations: 1. filter_snps: Cache allele counts to avoid recomputation - Count alleles once instead of twice - Combine biallelic and MAC filters efficiently 2. Location normalization: Use vectorized operations - Replace list comprehension with direct array operations - Eliminates slow Python loop over samples 3. Holdout data storage: Avoid transpose copy - Use efficient array ordering to minimize memory copies - Ensures C-contiguous arrays for better performance 4. normalize_locs: Create new array instead of modifying copy - More efficient memory usage - Clearer intent in the code These changes significantly improve performance for train_holdout and k-fold cross-validation workflows without affecting accuracy.
Change the default value of holdout_no_intermediate_saves from False to True to provide better performance out of the box for k-fold CV and leave-one-out workflows. Benefits: - Reduces HDF5 I/O from N model saves to just 1 - Particularly beneficial for leave-one-out CV with many samples - No impact on model quality or final results - Users can still set it to False if they need intermediate checkpoints This is especially helpful for: - run_k_fold_holdouts() with large k values - run_leave_one_out() which uses k = n_samples - Any repeated holdout analysis workflows The option only affects train_holdout() behavior, not regular train().
- Add test workflow with parallel execution using pytest-xdist - Support multi-Python version testing (3.9, 3.10, 3.11) - Configure CPU-only mode to prevent GPU conflicts in CI - Add coverage reporting with pytest-cov and Codecov integration - Include code quality checks (black, isort, flake8) - Add documentation building workflow - Add PyPI publishing workflow for releases - Add manual test trigger workflow for debugging - Configure pytest settings in pyproject.toml - Fix test_verbose_batch_size_auto to work with parallel execution - Add CUDA_VISIBLE_DEVICES=-1 to prevent GPU conflicts - Add Dependabot configuration for automated updates - Add issue and PR templates All tests now pass with parallel execution enabled.
- Add .pre-commit-config.yaml with black, isort, flake8 hooks - Configure black with 89 char line limit to match existing style - Configure isort to be compatible with black - Format entire codebase with black and isort - Fix trailing whitespace and missing newlines - Update CI workflows to use consistent linting - Add pre-commit workflow for automated checks - Add scripts for easy formatting and setup - Update documentation with pre-commit instructions All Python files now follow consistent formatting standards.
- Fix boolean comparisons (E712) - use 'is' instead of '==' - Remove/comment unused imports (F401) - Convert f-strings without placeholders to regular strings (F541) - Replace bare except with specific exceptions (E722) - Fix unused variables with comments or underscore (F841, B007) - Fix lambda assignment to def (E731) - Fix undefined names in __all__ (F822) - Add docstrings for missing __init__ methods (D107) - Fix block comment formatting (E265) - Fix blank line formatting and whitespace (E306, W293, D202) - Add per-file-ignores for complex functions (C901) - Apply black formatting to ensure consistency All 69 flake8 errors have been resolved, code is now fully compliant.
- Remove trailing whitespace from YAML and markdown files - Fix line endings and formatting inconsistencies - Apply black formatting to remaining Python files - Clean up imports and whitespace These changes were automatically applied by pre-commit hooks.
The pyproject.toml already contains coverage configuration in addopts, so passing duplicate --cov arguments was causing conflicts.
- Add pytest-cov and pytest-xdist to dev dependencies - Remove coverage configuration from pyproject.toml to avoid conflicts - Keep coverage arguments only in GitHub Actions workflow
- Add windows module exports to locator/data/__init__.py - Export generate_genomic_windows function for window analysis tests - Fix numpy boolean comparison in test_model_persistence.py - Convert numpy.True_ to Python bool before comparison
- Add tensorflow import for GPU detection - Update GPU config test to handle case when GPU is not available - In CI with CUDA_VISIBLE_DEVICES=-1, mixed precision is correctly disabled
- Change 'data/' to '/data/' and './data/' to only exclude top-level data directories - Add windows.py to git tracking (was previously excluded) - This fixes ModuleNotFoundError for locator.data.windows in CI
- WindowGenerator class does not exist in windows.py - Only generate_genomic_windows function is available - This fixes ImportError in all test files
- Make parallel module imports optional when Ray is not installed - Add stub functions with helpful error messages for parallel methods - Fix duplicate object description warning for PlottingMixin.plot_history - This allows docs to build successfully without Ray dependency
- Prevent accidentally committing large output files from examples - This directory contains generated plots and CSVs from demo runs
- Filter out harmless protobuf version warnings from TensorFlow - Fix module path for parallel functions in api.rst (use locator.parallel not locator.parallel.parallel_analysis) - The protobuf warnings are from TensorFlow/Google libraries and cannot be fixed by us - Documentation should now build successfully
Add CI/CD pipeline
- Created EnsembleMixin with modern patterns (IndexSet, tf.data pipeline) - Added k_fold_split method to IndexSet for efficient fold creation - Refactored EnsembleLocator as a legacy compatibility wrapper - Added comprehensive tests for ensemble functionality - Integrated EnsembleMixin into core Locator class Key improvements: - Memory-efficient data handling without array copies - Consistent NA handling with na_action parameter - Integration with modern tf.data pipeline - Backward compatibility through legacy wrapper - Uses standard normalize_locs function instead of manual normalization - Uses NormalizationParams class for denormalization - Reduced cyclomatic complexity by extracting helper methods BREAKING CHANGE: EnsembleLocator is now deprecated in favor of Locator's ensemble methods (train_ensemble, predict_ensemble). The old API still works but shows deprecation warnings.
Phase 1: Create EnsembleMixin with modern patterns - Created EnsembleMixin with modern patterns (IndexSet, tf.data pipeline) - Added k_fold_split method to IndexSet for efficient fold creation - Refactored EnsembleLocator as a legacy compatibility wrapper - Uses standard normalize_locs function instead of manual normalization - Uses NormalizationParams class for denormalization - Reduced cyclomatic complexity by extracting helper methods Phase 2: Memory efficiency and model management - Implemented _train_single_fold method to avoid creating separate Locator instances - Created EnsembleModelManager for efficient model storage and lazy loading - Fixed _create_model signature to use input_shape parameter - Fixed save_fold_models parameter passing through method chain - Made JSON serialization robust by filtering out DataFrames from config Test consolidation: - Consolidated test_ensemble_mixin.py and test_ensemble_phase2.py into test_ensemble.py - All 12 tests passing with comprehensive coverage of both phases Key improvements: - Memory-efficient data handling without array copies - Consistent NA handling with na_action parameter - Integration with modern tf.data pipeline - Backward compatibility through legacy wrapper - Efficient model management with lazy loading support - Comprehensive test coverage for both phases BREAKING CHANGE: EnsembleLocator is now deprecated in favor of Locator's ensemble methods (train_ensemble, predict_ensemble). The old API still works but shows deprecation warnings.
Implemented comprehensive ensemble functionality for Locator with k-fold cross-validation and advanced training optimizations. Phase 1 - Core Ensemble Functionality: - Added EnsembleMixin with train_ensemble() and predict_ensemble() methods - Implemented memory-efficient k-fold splitting using IndexSet - Support for NA sample handling during ensemble training - Proper normalization parameter averaging across folds Phase 2 - Model Persistence: - Created EnsembleModelManager for efficient model storage/loading - Memory-efficient prediction without loading all models at once - Metadata tracking for ensemble configuration and fold information - Support for on-demand model loading during prediction Phase 4 - Training Improvements: - Mixed precision training support via GPUOptimizer integration - Automatic batch size optimization for each fold - Enhanced callbacks with patience multiplier for ensemble training - Per-fold learning rate variation for improved diversity - Memory clearing between folds to prevent OOM errors Architecture: - All functionality consolidated in ensemble_mixin.py for maintainability - Reuses existing Locator infrastructure (tf.data pipeline, GPU optimizer) - Maintains backward compatibility with standard Locator interface - Comprehensive test suite with 15 tests covering all functionality Performance: - Memory-efficient training without creating separate Locator instances - GPU optimizations automatically applied when available - Efficient prediction pipeline with on-demand model loading - Proper memory management between fold training This refactoring enables robust ensemble predictions while maintaining code clarity and performance efficiency.
Implemented comprehensive ensemble functionality for Locator with k-fold cross-validation, advanced training optimizations, and parallel GPU execution.
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4...v5) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
- Added module-level skip decorator when Ray is not installed - Fixed mock patches to use 'ray' directly instead of module path - Added checks for stub functions in signature tests - Fixed unused variable warnings - Removed duplicate test file
feat: complete ensemble refactoring with parallel training support
…v/codecov-action-5
- added function to plot interactive error map - added dataframe output to the error summary plot
feat: interactive error map and summary plot updates
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the changes and which issue is fixed. Include relevant motivation and context.
Fixes #(issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Test Configuration:
Checklist: