Add Photon model and pipeline support #12456

DavidBert · 2025-10-09T13:21:05Z

This commit adds support for the Photon image generation model:

PhotonTransformer2DModel: Core transformer architecture
PhotonPipeline: Text-to-image generation pipeline
Attention processor updates for Photon-specific attention mechanism
Conversion script for loading Photon checkpoints
Documentation and tests

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This commit adds support for the Photon image generation model: - PhotonTransformer2DModel: Core transformer architecture - PhotonPipeline: Text-to-image generation pipeline - Attention processor updates for Photon-specific attention mechanism - Conversion script for loading Photon checkpoints - Documentation and tests

DavidBert · 2025-10-09T13:21:46Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


I'm not sure on this one: I'm saving the VAE weights while they are already available on the Hub (Flux VAE and DC-AE).
Is there a way to avoid storing them and instead look directly for the original ones?

For now, it's okay to keep this as is. This way, everything is under the same model repo.

DavidBert · 2025-10-09T13:22:22Z

scripts/convert_photon_to_diffusers.py

+    print(f"✓ Saved VAE to {vae_path}")
+
+
+def download_and_save_text_encoder(output_path: str):


Same here for the Text Encoder.

sayakpaul · 2025-10-09T13:40:52Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


For now, it's okay to keep this as is. This way, everything is under the same model repo.

src/diffusers/pipelines/photon/pipeline_output.py

src/diffusers/models/attention_processor.py

sayakpaul · 2025-10-09T13:43:15Z

src/diffusers/models/transformers/transformer_photon.py

+from einops import rearrange
+from einops.layers.torch import Rearrange


We need to get rid of the einops dependency and use native PyTorch ops here.

I changed it for native Pytorch. Out of curiosity why do you recommend avoiding using einops?

We try to avoid additional dependencies especially when things can be done in native PyTorch.

sayakpaul · 2025-10-09T13:43:30Z

src/diffusers/models/transformers/transformer_photon.py

+    return xq_out.reshape(*xq.shape).type_as(xq)
+
+
+class EmbedND(nn.Module):


Does this share similarity with Flux?

Yes it comes from the BFL original implementation.
I tried to modify and use the logic from transformer_flux.py but I didn't manage to make it work without heavy changes and additional complexity.
I added a comment to explicitely say that it come from there. Is it OK for you or do you want me to continue trying to use the code from transformer_flux.py?

Oh okay. Then it's fine to keep it here. I would maybe rename it to PhotoEmbedND and leave a note that it's inspired from Flux. WDYT?

Looks like this wasn't addressed.

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

sayakpaul

Thanks for the clean PR! I left some initial feedback for you. LMK if that makes sense.

Also, it would be great to see some samples of Photon!

sayakpaul

Thanks! Left a couple more comments. Let's also add the pipeline-level tests.

sayakpaul · 2025-10-13T10:59:17Z

docs/source/en/api/pipelines/photon.md

+<div class="flex flex-wrap space-x-1">
+  <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
+</div>


I don't think we're supporting it yet? If so, we can remove for now.

sayakpaul · 2025-10-13T10:59:38Z

docs/source/en/api/pipelines/photon.md

+  <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
+</div>
+
+Photon is a text-to-image diffusion model using simplified MMDIT architecture with flow matching for efficient high-quality image generation. The model uses T5Gemma as the text encoder and supports either Flux VAE (AutoencoderKL) or DC-AE (AutoencoderDC) for latent compression.


Cc: @stevhliu for a review on the docs.

sayakpaul · 2025-10-13T11:00:59Z

src/diffusers/models/transformers/transformer_photon.py

+    return xq_out.reshape(*xq.shape).type_as(xq)
+
+
+class PhotonAttnProcessor2_0:


Could we write it in a fashion similar to

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 75 in 8abc7ae

class FluxAttnProcessor:

?

sayakpaul · 2025-10-13T11:01:28Z

src/diffusers/models/transformers/transformer_photon.py

+    return xq_out.reshape(*xq.shape).type_as(xq)
+
+
+class EmbedND(nn.Module):


Looks like this wasn't addressed.

sayakpaul · 2025-10-13T11:02:23Z

src/diffusers/models/transformers/transformer_photon.py

+    gate: Tensor
+
+
+class Modulation(nn.Module):


For intermediate blocks like this, we avoid using a dataclass to return outputs.

sayakpaul · 2025-10-13T11:08:43Z

src/diffusers/pipelines/photon/pipeline_photon.py

+    ):
+        """Prepare initial latents for the diffusion process."""
+        if latents is None:
+            spatial_compression = self.vae_spatial_compression_ratio


For image models (where there ate no separate spatial and temporal compression factors). we usually just refer to it as vae_scale_factor:

https://github.com/huggingface/diffusers/blob/8abc7aeb715c0149ee0a9982b2d608ce97f55215/src/diffusers/pipelines/flux/pipeline_flux.py#L209C14-L209C34

sayakpaul · 2025-10-13T11:10:00Z

src/diffusers/pipelines/photon/pipeline_photon.py

+    def __call__(
+        self,
+        prompt: Union[str, List[str]] = None,
+        height: Optional[int] = None,


We support passing prompt embeddings too in case users want to supply them precomputed:

diffusers/src/diffusers/pipelines/flux/pipeline_flux.py

Line 669 in 8abc7ae

prompt_embeds: Optional[torch.FloatTensor] = None,

sayakpaul · 2025-10-13T11:10:46Z

src/diffusers/pipelines/photon/pipeline_photon.py

+        default_sample_size = getattr(self.config, "default_sample_size", DEFAULT_RESOLUTION)
+        height = height or default_sample_size
+        width = width or default_sample_size


Prefer this pattern:

diffusers/src/diffusers/pipelines/flux/pipeline_flux.py

Line 783 in 8abc7ae

height = height or self.default_sample_size * self.vae_scale_factor

I did it this way because the model works for two different vae with different scale_factors.
Is it ok to not make it depend of self.vae_scale_factor? It makes it hard to define a default value otherwise.

sayakpaul · 2025-10-13T11:11:32Z

src/diffusers/pipelines/photon/pipeline_photon.py

+                )[0]
+
+                # Apply CFG
+                if self.do_classifier_free_guidance:


I didn't see negative_prompt in the __call__() of the pipeline. Is that expected?

sayakpaul · 2025-10-13T11:12:21Z

src/diffusers/pipelines/photon/pipeline_photon.py

+                    ca_embed = torch.cat([uncond_text_embeddings, text_embeddings], dim=0)
+                    ca_mask = None
+                    if cross_attn_mask is not None and uncond_cross_attn_mask is not None:
+                        ca_mask = torch.cat([uncond_cross_attn_mask, cross_attn_mask], dim=0)


These can be moved out of the loop, right?

DavidBert commented Oct 9, 2025

View reviewed changes