Skip to content

Feature/generation model#41

Open
qchapp wants to merge 18 commits into
mainfrom
feature/generation-model
Open

Feature/generation model#41
qchapp wants to merge 18 commits into
mainfrom
feature/generation-model

Conversation

@qchapp
Copy link
Copy Markdown
Member

@qchapp qchapp commented Apr 26, 2026

This pull request adds support for text-to-image generation using Diffusers models in the MMIRAGE library. It introduces a new image_gen processor type, complete with configuration, output variable definition, and documentation. The changes also include a sample configuration file, dependency management, and updates to the processor registry to enable seamless integration of image generation workflows.

Image generation support:

  • Added a new image_gen processor, including its configuration (DiffusersImageGenConfig), output variable definition (ImageGenOutputVar), and registration in the processor registry. This enables text-to-image generation using Diffusers pipelines, with support for various runtime and output options. [1] [2] [3]
  • Updated the processor registry and config utilities to lazily import the new image generation processor, ensuring efficient resource usage and modularity. [1] [2]

Configuration and documentation:

  • Added a sample YAML configuration (configs/config_mock_image_gen.yaml) demonstrating how to use the new image_gen processor for text-to-image generation, including parallel inference and output customization.
  • Expanded the README.md to document support for image generation models, provide configuration examples, and explain the new processor type and its parameters. [1] [2] [3]

Dependency management:

  • Added an optional image_gen dependency group to pyproject.toml for installing required libraries (diffusers, accelerate, safetensors).

Core pipeline updates:

  • Updated the Mapper class to accept and forward the shard_id parameter to processors, ensuring correct sharding behavior for image generation tasks. [1] [2]

@qchapp qchapp self-assigned this Apr 26, 2026
Copilot AI review requested due to automatic review settings April 26, 2026 21:30
@qchapp qchapp linked an issue Apr 26, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new image_gen processor to MMIRAGE to enable text-to-image generation via Diffusers, plus the supporting config/docs and pipeline wiring needed to run it in shard processing.

Changes:

  • Introduces image_gen processor implementation + config/output-var types and registers it for lazy loading.
  • Adds optional dependency group ([image_gen]) and sample config/data for running an image generation pipeline.
  • Updates shard processing + mapper to support sharding context (shard_id) and to cast generated image-path columns to HF Image.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/mock_data_image_gen/data.jsonl Adds mock prompt data for image generation examples/tests.
src/mmirage/shard_process.py Forwards shard_id into the mapper and casts image-path outputs to HF Image.
src/mmirage/core/process/processors/image_gen/image_gen_processor.py Implements Diffusers-backed image generation processor with path/PIL output modes.
src/mmirage/core/process/processors/image_gen/config.py Adds DiffusersImageGenConfig and ImageGenOutputVar with template validation.
src/mmirage/core/process/processors/image_gen/init.py Creates the new processor module package.
src/mmirage/core/process/mapper.py Extends mapper to accept/forward shard_id into processors.
src/mmirage/core/process/base.py Registers image_gen for lazy processor import.
src/mmirage/config/utils.py Ensures image_gen config types are registered at config-load time.
pyproject.toml Adds optional dependency group for Diffusers-based image generation.
configs/config_mock_image_gen.yaml Provides a runnable example config for the new processor.
README.md Documents image generation support, config example, and optional install.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/core/process/processors/image_gen/image_gen_processor.py Outdated
Comment thread src/mmirage/core/process/processors/image_gen/image_gen_processor.py Outdated
Comment thread src/mmirage/core/process/mapper.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/core/process/processors/image_gen/image_gen_processor.py Outdated
Comment thread src/mmirage/shard_process.py Outdated
Comment thread src/mmirage/core/process/processors/image_gen/config.py
…sor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mmirage/core/process/base.py Outdated
Comment thread src/mmirage/shard_process.py Outdated
logger.info(f"✅ Successfully loaded processor of type {config.type}")

self.processors[config.type] = processor_cls(config)
self.processors[config.type] = processor_cls(config, shard_id=shard_id)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the shard_id is currently ignored by LLMProcessor, maybe make it use it as well? it seems to be used only for computing the render filename

Comment on lines +79 to +85
for col in cols:
if col in ds.column_names:
ds = ds.map(
_normalise_col, batched=True, fn_kwargs={"col": col}, desc=f"Normalising {col}",
load_from_cache_file=False,
)
ds = ds.cast_column(col, HFImage())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a helper function that is also called for each split if ds is a DatasetDict -> avoids code duplication

default_sampling_params: Dict[str, Any] = field(default_factory=dict)
parallel_inference: bool = True
parallel_chunk_size: Optional[int] = 4
output_dir: str = ".mmirage/generated_images"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes a new folder .mmirage at the root of the local repository?


def __post_init__(self) -> None:
"""Validate optional parallelism settings."""
if self.parallel_chunk_size is not None and self.parallel_chunk_size <= 0:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds better to raise an error here, it should not be silently interpreted as None when a value is nonpositive


def get_output_dir(self) -> str:
"""Get normalized absolute output directory path."""
return os.path.abspath(os.path.expanduser(self.output_dir))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not in the cache folder?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like DEFAULT_STATE_DIR = "~/.cache/MMIRAGE/state_dir" in src/mmore/config/loading.py

os.unlink(tmp_path)
except OSError:
pass
raise
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have a more specific error

updated: List[VariableEnvironment] = []
for local_index, (env, image) in enumerate(zip(chunk, images)):
sample_index = start_index + local_index
if output_var.output_mode == "pil":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having an enum for the output mode would make sense...

if negative_prompt is not None:
call_kwargs["negative_prompt"] = negative_prompt
output = self._pipeline(**call_kwargs)
image = output.images[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it guaranteed to work / that there is no more than 1 image?


def shutdown(self) -> None:
"""Release pipeline references."""
self._pipeline = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really enough to shutdown?

Comment thread configs/config_mock_image_gen.yaml Outdated
Co-authored-by: fabnemEPFL <117652591+fabnemEPFL@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.

image_path_var_names = {
v.name
for v in output_vars
if getattr(v, "output_mode", None) == "path"
Comment on lines +223 to +224
mapper.shutdown()
logger.info("Processors shut down.")
Comment on lines +103 to +107

def image_output_mode_hook(value: Any) -> ImageOutputMode:
if isinstance(value, ImageOutputMode):
return value
return ImageOutputMode(value)
processing_params:
inputs:
- name: text
key: caption
processing_params:
inputs:
- name: text
key: caption
Comment on lines +66 to +67
def _normalise_col(batch: Dict[str, Any], col: str) -> Dict[str, Any]:
return {col: [v if v else None for v in batch[col]]}
Comment on lines +168 to +172
proc = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
env=os.environ.copy(),
placement_device, generator_device, use_device_map = self._resolve_auto_device(args)

if use_device_map:
device_map = getattr(args, "device_map", None) or "balanced"
Comment on lines +4 to +5
using Diffusers pipelines. It can emit either saved image paths or in-memory
PIL images.
Comment on lines +38 to +39
``"auto"`` distributes across all available GPUs when more than
one is present (via ``device_map='auto'``), or falls back to CPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use image generation models

3 participants