Feature/merge#35
Conversation
There was a problem hiding this comment.
Pull request overview
Adds automated and manual shard-merge support to MMIRAGE so users can merge shard_* outputs after successful runs (or on-demand), with clearer logging and documentation.
Changes:
- Introduces
execution_params.mergeconfig flag and wires it intorunto trigger post-run merging. - Refactors merge logic into reusable functions (
merge_dataset_dir,merge_input_dir,merge_from_config) and adds per-datasetMergeReportsummaries. - Extends CLI and README with new
mergeandmerge-dircommands and usage guidance.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/mmirage/shard_utils.py |
Adds MergeReport and shared shard/dataset directory discovery helpers for merge operations. |
src/mmirage/merge_shards.py |
Refactors merging into callable APIs, adds config-based merging, and switches output to structured logging. |
src/mmirage/config/config.py |
Adds execution_params.merge to configuration model. |
src/mmirage/cli.py |
Adds merge / merge-dir commands and triggers auto-merge after successful run when enabled. |
README.md |
Documents merge flag and new CLI commands with examples and output behavior. |
configs/config_mock.yaml |
Enables merge in mock config example. |
configs/config_mock_vision.yaml |
Enables merge in mock vision config example. |
configs/config_comprehensive.yaml |
Documents merge flag with defaults and explanation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: fabnemEPFL <117652591+fabnemEPFL@users.noreply.github.com>
Co-authored-by: fabnemEPFL <117652591+fabnemEPFL@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request introduces a new feature to MMIRAGE: automated merging of shard outputs after a successful pipeline run, along with CLI commands for manual merging. The changes update the configuration, documentation, and CLI to support merging both via config and direct commands, and add logging/reporting for merge operations.
New merging functionality:
mergeoption toexecution_paramsin the configuration files (configs/config_comprehensive.yaml,configs/config_mock.yaml,configs/config_mock_vision.yaml) and updates theExecutionParamsclass to support this option. When enabled, MMIRAGE will automatically merge shard outputs after a successful run. [1] [2] [3] [4] [5]merge(merges datasets listed in config) andmerge-dir(merges from a specified directory), with support for custom output locations and improved logging of merge results. [1] [2] [3] [4] [5]Pipeline and CLI integration:
execution_params.mergeis true, merging is triggered automatically after a successful run, both in local and SLURM modes. [1] [2] [3]Documentation improvements:
README.mdto document the newmergeoption and CLI commands, including usage examples and explanations of merge output behavior for single and multiple datasets. [1] [2] [3] [4]Refactoring and support code:
merge_shards.pyto support merging from config or directories, and to provide detailed merge reports.These changes make it easier to manage and combine dataset shards in MMIRAGE, both automatically and manually, improving workflow efficiency and reproducibility.