Skip to content

Feature/merge#35

Merged
fabnemEPFL merged 16 commits into
mainfrom
feature/merge
Apr 14, 2026
Merged

Feature/merge#35
fabnemEPFL merged 16 commits into
mainfrom
feature/merge

Conversation

@qchapp
Copy link
Copy Markdown
Member

@qchapp qchapp commented Apr 1, 2026

This pull request introduces a new feature to MMIRAGE: automated merging of shard outputs after a successful pipeline run, along with CLI commands for manual merging. The changes update the configuration, documentation, and CLI to support merging both via config and direct commands, and add logging/reporting for merge operations.

New merging functionality:

  • Adds a new merge option to execution_params in the configuration files (configs/config_comprehensive.yaml, configs/config_mock.yaml, configs/config_mock_vision.yaml) and updates the ExecutionParams class to support this option. When enabled, MMIRAGE will automatically merge shard outputs after a successful run. [1] [2] [3] [4] [5]
  • Implements new CLI commands: merge (merges datasets listed in config) and merge-dir (merges from a specified directory), with support for custom output locations and improved logging of merge results. [1] [2] [3] [4] [5]

Pipeline and CLI integration:

  • Updates the pipeline execution logic so that if execution_params.merge is true, merging is triggered automatically after a successful run, both in local and SLURM modes. [1] [2] [3]
  • Adds logging and reporting for merge operations, including summaries of merged rows, skipped directories, and output locations. [1] [2]

Documentation improvements:

  • Updates the README.md to document the new merge option and CLI commands, including usage examples and explanations of merge output behavior for single and multiple datasets. [1] [2] [3] [4]

Refactoring and support code:

  • Refactors merge_shards.py to support merging from config or directories, and to provide detailed merge reports.

These changes make it easier to manage and combine dataset shards in MMIRAGE, both automatically and manually, improving workflow efficiency and reproducibility.

Copilot AI review requested due to automatic review settings April 1, 2026 17:35
@qchapp qchapp linked an issue Apr 1, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automated and manual shard-merge support to MMIRAGE so users can merge shard_* outputs after successful runs (or on-demand), with clearer logging and documentation.

Changes:

  • Introduces execution_params.merge config flag and wires it into run to trigger post-run merging.
  • Refactors merge logic into reusable functions (merge_dataset_dir, merge_input_dir, merge_from_config) and adds per-dataset MergeReport summaries.
  • Extends CLI and README with new merge and merge-dir commands and usage guidance.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/mmirage/shard_utils.py Adds MergeReport and shared shard/dataset directory discovery helpers for merge operations.
src/mmirage/merge_shards.py Refactors merging into callable APIs, adds config-based merging, and switches output to structured logging.
src/mmirage/config/config.py Adds execution_params.merge to configuration model.
src/mmirage/cli.py Adds merge / merge-dir commands and triggers auto-merge after successful run when enabled.
README.md Documents merge flag and new CLI commands with examples and output behavior.
configs/config_mock.yaml Enables merge in mock config example.
configs/config_mock_vision.yaml Enables merge in mock vision config example.
configs/config_comprehensive.yaml Documents merge flag with defaults and explanation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/merge_shards.py Outdated
Comment thread src/mmirage/merge_shards.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/shard_utils.py
Comment thread src/mmirage/merge_shards.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/shard_utils.py
Comment thread src/mmirage/merge_shards.py
Comment thread src/mmirage/merge_shards.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: fabnemEPFL <117652591+fabnemEPFL@users.noreply.github.com>
Co-authored-by: fabnemEPFL <117652591+fabnemEPFL@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/merge_shards.py
Comment thread src/mmirage/shard_utils.py
Comment thread src/mmirage/merge_shards.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/merge_shards.py Outdated
Comment thread src/mmirage/merge_shards.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@fabnemEPFL fabnemEPFL merged commit 0772a65 into main Apr 14, 2026
0 of 2 checks passed
@fabnemEPFL fabnemEPFL deleted the feature/merge branch April 14, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the possibility to merge automatically in the config

3 participants