Skip to content

Conversation

@Kmoneal
Copy link

@Kmoneal Kmoneal commented Apr 22, 2025

This is very similar to Diffusion but instead of seed takes image of the types specified by the model. For Stable Diffusion, accepted types can be found here.

I'm happy to use this to kick off conversations on this topic as well.

@ericluo04 ericluo04 mentioned this pull request Apr 23, 2025
@ericluo04
Copy link

Hi @Kmoneal - thank you very much for creating this PR!

Unfortunately, I've been having some trouble getting this implementation to work. Would it be kindly possible for you to share a minimally reproducible example of how to do the following (e.g., with stabilityai/stable-diffusion-3.5-large):

  1. Store the activations of the residual stream (e.g., output of the transformer block at index 24), for any choice/range of timestep.
  2. Intervene on the activations of the above (e.g., by tripling the activation of a particular dimension), for any choice/range of timestep.

Any help would be much appreciated! Thanks again. :)

@ericluo04
Copy link

ericluo04 commented May 7, 2025

Figured it out! Turns out you can't specify the prompt using prompt = "..." but just have to enter it directly as the first parameter value. See below for extracting the residual stream of the 25th layer (index 24) in stabilityai/stable-diffusion-3.5-large for the first step. Note that init_image is of type PIL.Image.Image.

# transformer block layers
layers = pipe.transformer.transformer_blocks

with pipe.generate("", negative_prompt="", guidance_scale=7.5, 
                   image=init_image, width=832, height=1248,
                   strength=.5, num_inference_steps=4,
                   seed=None):
    # initialize list to store activations
    res_stream = nnsight.list().save() 
    res_stream_text = nnsight.list().save()
    res_stream_image = nnsight.list().save()
    
    # loop over steps, can use layer.all() to extract for all steps
    with layers.iter[0:1]:
        # 24th layer output residual stream for text and image stream
        res_stream.append(layers[24].output)
        # 25th layer input for text stream (to check if same as above)
        res_stream_text.append(layers[25].norm1_context.input)
        # 25th layer input for image stream (to check if same as above)
        res_stream_image.append(layers[25].norm1.input)

@nguyentr17
Copy link

Hi, is it not possible to use the current DiffusionModel class for a image to image pipeline like Flux Kontext? I tried and it works for instructpix2pix but for Flux Kontext, it gives me the following error:
ValueError: Cannot return output of Envoy that is not interleaving nor has a fake output set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants