Feature/generation model#41
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new image_gen processor to MMIRAGE to enable text-to-image generation via Diffusers, plus the supporting config/docs and pipeline wiring needed to run it in shard processing.
Changes:
- Introduces
image_genprocessor implementation + config/output-var types and registers it for lazy loading. - Adds optional dependency group (
[image_gen]) and sample config/data for running an image generation pipeline. - Updates shard processing + mapper to support sharding context (
shard_id) and to cast generated image-path columns to HFImage.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/mock_data_image_gen/data.jsonl | Adds mock prompt data for image generation examples/tests. |
| src/mmirage/shard_process.py | Forwards shard_id into the mapper and casts image-path outputs to HF Image. |
| src/mmirage/core/process/processors/image_gen/image_gen_processor.py | Implements Diffusers-backed image generation processor with path/PIL output modes. |
| src/mmirage/core/process/processors/image_gen/config.py | Adds DiffusersImageGenConfig and ImageGenOutputVar with template validation. |
| src/mmirage/core/process/processors/image_gen/init.py | Creates the new processor module package. |
| src/mmirage/core/process/mapper.py | Extends mapper to accept/forward shard_id into processors. |
| src/mmirage/core/process/base.py | Registers image_gen for lazy processor import. |
| src/mmirage/config/utils.py | Ensures image_gen config types are registered at config-load time. |
| pyproject.toml | Adds optional dependency group for Diffusers-based image generation. |
| configs/config_mock_image_gen.yaml | Provides a runnable example config for the new processor. |
| README.md | Documents image generation support, config example, and optional install. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…sor.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…T/mmirage into feature/generation-model
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.info(f"✅ Successfully loaded processor of type {config.type}") | ||
|
|
||
| self.processors[config.type] = processor_cls(config) | ||
| self.processors[config.type] = processor_cls(config, shard_id=shard_id) |
There was a problem hiding this comment.
the shard_id is currently ignored by LLMProcessor, maybe make it use it as well? it seems to be used only for computing the render filename
| for col in cols: | ||
| if col in ds.column_names: | ||
| ds = ds.map( | ||
| _normalise_col, batched=True, fn_kwargs={"col": col}, desc=f"Normalising {col}", | ||
| load_from_cache_file=False, | ||
| ) | ||
| ds = ds.cast_column(col, HFImage()) |
There was a problem hiding this comment.
could be a helper function that is also called for each split if ds is a DatasetDict -> avoids code duplication
| default_sampling_params: Dict[str, Any] = field(default_factory=dict) | ||
| parallel_inference: bool = True | ||
| parallel_chunk_size: Optional[int] = 4 | ||
| output_dir: str = ".mmirage/generated_images" |
There was a problem hiding this comment.
makes a new folder .mmirage at the root of the local repository?
|
|
||
| def __post_init__(self) -> None: | ||
| """Validate optional parallelism settings.""" | ||
| if self.parallel_chunk_size is not None and self.parallel_chunk_size <= 0: |
There was a problem hiding this comment.
it sounds better to raise an error here, it should not be silently interpreted as None when a value is nonpositive
|
|
||
| def get_output_dir(self) -> str: | ||
| """Get normalized absolute output directory path.""" | ||
| return os.path.abspath(os.path.expanduser(self.output_dir)) |
There was a problem hiding this comment.
why not in the cache folder?
There was a problem hiding this comment.
like DEFAULT_STATE_DIR = "~/.cache/MMIRAGE/state_dir" in src/mmore/config/loading.py
| os.unlink(tmp_path) | ||
| except OSError: | ||
| pass | ||
| raise |
There was a problem hiding this comment.
maybe have a more specific error
| updated: List[VariableEnvironment] = [] | ||
| for local_index, (env, image) in enumerate(zip(chunk, images)): | ||
| sample_index = start_index + local_index | ||
| if output_var.output_mode == "pil": |
There was a problem hiding this comment.
having an enum for the output mode would make sense...
| if negative_prompt is not None: | ||
| call_kwargs["negative_prompt"] = negative_prompt | ||
| output = self._pipeline(**call_kwargs) | ||
| image = output.images[0] |
There was a problem hiding this comment.
is it guaranteed to work / that there is no more than 1 image?
|
|
||
| def shutdown(self) -> None: | ||
| """Release pipeline references.""" | ||
| self._pipeline = None |
There was a problem hiding this comment.
is it really enough to shutdown?
Co-authored-by: fabnemEPFL <117652591+fabnemEPFL@users.noreply.github.com>
| image_path_var_names = { | ||
| v.name | ||
| for v in output_vars | ||
| if getattr(v, "output_mode", None) == "path" |
| mapper.shutdown() | ||
| logger.info("Processors shut down.") |
|
|
||
| def image_output_mode_hook(value: Any) -> ImageOutputMode: | ||
| if isinstance(value, ImageOutputMode): | ||
| return value | ||
| return ImageOutputMode(value) |
| processing_params: | ||
| inputs: | ||
| - name: text | ||
| key: caption |
| processing_params: | ||
| inputs: | ||
| - name: text | ||
| key: caption |
| def _normalise_col(batch: Dict[str, Any], col: str) -> Dict[str, Any]: | ||
| return {col: [v if v else None for v in batch[col]]} |
| proc = subprocess.Popen( | ||
| cmd, | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.STDOUT, | ||
| env=os.environ.copy(), |
| placement_device, generator_device, use_device_map = self._resolve_auto_device(args) | ||
|
|
||
| if use_device_map: | ||
| device_map = getattr(args, "device_map", None) or "balanced" |
| using Diffusers pipelines. It can emit either saved image paths or in-memory | ||
| PIL images. |
| ``"auto"`` distributes across all available GPUs when more than | ||
| one is present (via ``device_map='auto'``), or falls back to CPU. |
This pull request adds support for text-to-image generation using Diffusers models in the MMIRAGE library. It introduces a new
image_genprocessor type, complete with configuration, output variable definition, and documentation. The changes also include a sample configuration file, dependency management, and updates to the processor registry to enable seamless integration of image generation workflows.Image generation support:
image_genprocessor, including its configuration (DiffusersImageGenConfig), output variable definition (ImageGenOutputVar), and registration in the processor registry. This enables text-to-image generation using Diffusers pipelines, with support for various runtime and output options. [1] [2] [3]Configuration and documentation:
configs/config_mock_image_gen.yaml) demonstrating how to use the newimage_genprocessor for text-to-image generation, including parallel inference and output customization.README.mdto document support for image generation models, provide configuration examples, and explain the new processor type and its parameters. [1] [2] [3]Dependency management:
image_gendependency group topyproject.tomlfor installing required libraries (diffusers,accelerate,safetensors).Core pipeline updates:
Mapperclass to accept and forward theshard_idparameter to processors, ensuring correct sharding behavior for image generation tasks. [1] [2]