-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add callback parameters for Stable Diffusion pipelines #521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: James R T <[email protected]>
The documentation is not available anymore as the PR was closed or merged. |
Signed-off-by: James R T <[email protected]>
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
When I wanted to see how things were working under the hood, I did this exact thing. Except since I was snoopy I sent the latents themselves and the timestep, not just the decoded image and step number. I'm on the fence whether it would be a good idea to make the callback signature more complicated to support more use cases, or whether it's best to do the simpler thing that is what people are interested in 95% of the time. The other thing to note is that this dovetails with the discussion in #374. We could expose intermediate results by yielding them from a generator, or by sending them back through a callback, but putting both interfaces in the same pipeline would be a mess. |
If it's worth including an option to result the latents in the output (as proposed by #506), then it's worth including them in the intermediate results. |
thanks for making this PR @jamestiotio! this will be useful for us in https://github.com/franksalim/franksalim-imagetools |
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
Signed-off-by: James R T <[email protected]>
Signed-off-by: James R T <[email protected]>
@keturn @deepdiffuser I have implemented your suggestions and fixes.
Alright, I have refactored the implementation to allow the callback function to take in the timesteps and latents as parameters as well.
Hmm, I am not sure about whether a callback function or a generator would be preferable for this. Personally, I don't mind with either one, but I will leave the decision of which one to implement and, if it is the generator version, how to go about implementing that generator (should we simply yield a tuple containing all 4 parameters?) to the maintainers. Also, the linked issue contains a comment that raised the problem of returning intermediate results due to the need of running the safety checker, and because of that, I have refactored the implementation to also run the safety checker on every intermediate image before passing them to the callback function. While this will slow down the overall execution time whenever a callback function is defined, I believe that this should be an acceptable compromise. |
Signed-off-by: James R T <[email protected]>
Overall, a bit worried that this is not "bare-bone" / "core-functionality" enough to deserve to be in the main pipelines. If we add more and more of such features the pipelines will quickly explode in terms of complexity. Would a community pipeline be fine for you maybe? (working on making community pipelines easier to use with pip diffusers) Also @patil-suraj @anton-l @pcuenca here |
On "core functionality": I think it's a feature common in other implementations, it's useful for any kind of interactive application, and "how do I save the intermediate images" is one of the FAQs we get over on Stable Diffusion discord. Granted, that last bit is partly because some people start with incorrect assumptions how the inference process works and they expect the 20th step of a 100-step task will give them the same thing they see as the 20th step from a 20 step task, but that's not an argument against it. If there are low-cost ways to better enable explainability tools like diffusers-interpret, that sounds like great core functionality.
After building out a little demo, I don't know whether I'd argue a generator is a better external interface for it, but it is certainly easier to wrap a generator implementation and make it invoke callbacks than it would be to try to do it the other way around. |
Regarding the concern on "core functionality", I agree with @keturn.
Thank you for the demo @keturn, it definitely helped me visualise how a generator version of the pipeline can be used. Another concern that came to my mind after checking out your demo was backward compatibility. Implementing a generator version by default (i.e., without the user specifying some kind of flag to
Providing an explicit, user-specified flag to |
I'm torn about this one. On one hand, it is true that adding too many pieces will imply conditional code, additional arguments, and other forms of complexity. From that point of view, I'd rather have it as a community pipeline. On the other hand, this particular feature might be general enough to support many different use cases, as shown by the community. In particular, explainability and interpretability could benefit from it, like diffusers-interpret shows, or this puzzling latents visualization. A community pipeline would also work fine, but it has a couple of drawbacks too: discoverability, and keeping implementations in sync with the main code. Other considerations:
I think On the design itself, after reading the different versions of the code I have a couple of initial comments:
I'm happy to do a more thorough review after some more discussion :) |
Signed-off-by: James R T <[email protected]>
@pcuenca I have implemented some of your suggestions above.
Some of my thoughts on this aspect:
|
Signed-off-by: James R T <[email protected]>
My two cents on this issue is that a callback functionality is essential for most interactive applications. I had to implement it myself for my desktop frontend and I was very surprised to find out that it is not a built-in feature. I understand your desire to keep the interface simple, but I would say this functionality is too important not to include. The main problem is that you can not simply subclass One possible solution for this issue without complicating the interface is to have a method, say, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is good to go for me then! Thanks a lot for your thoughtful comments here.
Agree that callbacks are general enough to be implemented in the core-pipelines.
I like the changes as they are done now since they are kept simple. Regarding your two questions @pcuenca:
- Flax doesn't need to mirror the behavior here
- I think the way the safety checker is handled in this PR is nice.
We just need to add three tests to merge this PR @jamestiotio could you maybe add three test showing intermediate decoding of the pipelines in test_pipelines.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the main Stable Diffusion pipeline and suggested some changes, essentially to restore the decoding and safety checker to inside the __call__
function as they were before, since we are just sending the latents to the callback. I also commented a few nits about the timestep sent to the callback, I believe it should just be an int
.
The same comments would apply to the other pipelines, so I didn't repeat them there. I did add an additional comment in the ONNX pipeline to send the latents as numpy arrays for consistency, but @anton-l is much more knowledgeable about that.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_onnx.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_onnx.py
Show resolved
Hide resolved
I'm asking because in my opinion we should send the latents to the callback instead of the decoded image, to make it as general as possible. There are several examples of users leveraging the latents for their work. If that's the case, we won't decode the image and therefore won't run the safety checker for the user. Downstream applications that expose the intermediate latents should therefore do it themselves. Is that an acceptable solution @patrickvonplaten? I wrote my review assuming that it was :) Final question, what do you think about @ahrm's proposal to use a subclassable function (empty in the |
@jamestiotio would you be able to take care of that? Otherwise I can do it. |
By the way, great work @jamestiotio, I liked it a lot that you paid a lot of attention to details and included docstrings and type hints :) From my point of view this is very cool and ready to go when we iron out the final details! |
In that case, as mentioned above by @keturn, would it be better to keep the
As a user of the library, I am impartial to both implementations (subclassable function and
I should be able to take care of this. Do give me some time to add them.
Thank you! 😄 |
Awesome this looks good to me. Ok to merge whenever @pcuenca is happy with it :-) |
Signed-off-by: James R T <[email protected]>
Hi @jamestiotio! Thanks a lot for writing the tests, they look great! However, they fail in my system. I'm seeing two problems:
Are you testing in GPU too? Otherwise, I can try to fix them myself :) |
Signed-off-by: James R T <[email protected]>
Hi @pcuenca, apologies for the previously failing tests. It seems that my assumption that each step on a CPU is equivalent to each step on a GPU and would produce the same values of latents was wrong. 😅 I have fixed the tests accordingly in commit fe05ea2. Since my GPU does not seem to possess enough VRAM to be able to load the full-precision model, I have decided to modify the tests slightly:
All 4 tests pass on my side. Feel free to modify the values of the |
Hi @jamestiotio don't worry, I'll merge this one and then update the tests so they use standard arguments. Thanks a lot! |
* Add callback parameters for Stable Diffusion pipelines Signed-off-by: James R T <[email protected]> * Lint code with `black --preview` Signed-off-by: James R T <[email protected]> * Refactor callback implementation for Stable Diffusion pipelines * Fix missing imports Signed-off-by: James R T <[email protected]> * Fix documentation format Signed-off-by: James R T <[email protected]> * Add kwargs parameter to standardize with other pipelines Signed-off-by: James R T <[email protected]> * Modify Stable Diffusion pipeline callback parameters Signed-off-by: James R T <[email protected]> * Remove useless imports Signed-off-by: James R T <[email protected]> * Change types for timestep and onnx latents * Fix docstring style * Return decode_latents and run_safety_checker back into __call__ * Remove unused imports * Add intermediate state tests for Stable Diffusion pipelines Signed-off-by: James R T <[email protected]> * Fix intermediate state tests for Stable Diffusion pipelines Signed-off-by: James R T <[email protected]> Signed-off-by: James R T <[email protected]>
Hi peeps, thanks for your amazing work. Is it possible to use this callback parameter to do masking with the img2img pipeline? What I want is, mask an area of an image that should remain unchanged while the img2img pipeline "normally" affects the rest of the image. If yes: How would I do that? |
Hey @nicollegah, Could you maybe open a new issue for this one? |
Sure! I just did #2073 |
* Add callback parameters for Stable Diffusion pipelines Signed-off-by: James R T <[email protected]> * Lint code with `black --preview` Signed-off-by: James R T <[email protected]> * Refactor callback implementation for Stable Diffusion pipelines * Fix missing imports Signed-off-by: James R T <[email protected]> * Fix documentation format Signed-off-by: James R T <[email protected]> * Add kwargs parameter to standardize with other pipelines Signed-off-by: James R T <[email protected]> * Modify Stable Diffusion pipeline callback parameters Signed-off-by: James R T <[email protected]> * Remove useless imports Signed-off-by: James R T <[email protected]> * Change types for timestep and onnx latents * Fix docstring style * Return decode_latents and run_safety_checker back into __call__ * Remove unused imports * Add intermediate state tests for Stable Diffusion pipelines Signed-off-by: James R T <[email protected]> * Fix intermediate state tests for Stable Diffusion pipelines Signed-off-by: James R T <[email protected]> Signed-off-by: James R T <[email protected]>
This PR adds the parameters
callback
andcallback_frequency
to the__call__
methods of the Stable Diffusion pipelines.This PR closes #459.
As discussed in the accompanying issue, these callbacks can be helpful for consumers of the
diffusers
API who would like to inspect the current progress of the pipeline.Instead of defining both
callback
andimg_callback
such as those implemented in the original Stable Diffusion Repository, this implementation follows the more general version of the callback, with the callback function having the following signature:This way, downstream consumers can select whichever parameters they want to use to implement incremental diffusion. This also avoids unnecessary, duplicated parameters.
Consumers can also specify the
callback_frequency
parameter to indicate how often they would like the callback function to be called. Ifcallback
is defined butcallback_frequency
is not specified, the callback function will be called at every step.Signed-off-by: James R T [email protected]