[SchedulerDesign] Alternative scheduler design #711

patrickvonplaten · 2022-10-03T18:57:55Z

Alternative to #637 cc @anton-l

HuggingFaceDocBuilderDev · 2022-10-03T19:01:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

keturn · 2022-10-03T20:36:30Z

src/diffusers/schedulers/scheduling_lms_discrete.py

@@ -178,7 +179,8 @@ def step(
            When returning a tuple, the first element is the sample tensor.

        """
-        sigma = self.sigmas[timestep]
+        index = (self.config.num_train_timesteps - timestep) // (self.config.num_train_timesteps // self.num_inference_steps)


👍 I like this bit, having the scheduler responsible for figuring out what to do with the timestep instead of having the pipeline keep track of how schedulers interpret their arguments.

keturn · 2022-10-03T20:39:35Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

+        if self.scheduler.type == SchedulerType.CONTINUOUS:
+            latents = latents * self.scheduler.init_sigma


A step in the right direction. I'd love to get rid of the if entirely, but as long as we have it, defining a scheduler.type enum is much preferable to isinstance!

@anton-l can also be convinced here to change it to something that might be cleaner (again if we don't have to force a function upon DDIM or DDPM)

Overall, I feel quite strongly about the following though:

Let's not make easy schedulers more complex because we'd like to support newer, more complex schedulers

Forcing every scheduler to implement a certain method that can grow arbitrary in complexity is much worse that educating if statements with a nice comment.

keturn · 2022-10-03T20:43:03Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

-                latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+
+            if self.scheduler.type == SchedulerType.CONTINUOUS and self.model.config.trained_scheduler_type == SchedulerType.DISCRETE:
+                latent_model_input = self.scheduler.scale_(latent_model_input)


is this scale_ implemented?

anton-l

@patrickvonplaten as you said in #637:

I think we should focus much more on global differences

Which isn't solved fully here for a couple of reasons (that I see at the moment):

Putting the schedulers into continuous and discrete categories is a bit restrictive IMO, as it implies that these categories have different APIs, while mathematically there's no such restriction (we can freely convert between the timestep representations, as Karras et al. have shown). Having this distinction on the model side can be beneficial to learn what type of noise conditioning the model expects, but we can get away with having it on the pipeline side too.
With this design we still have a very different representation of timesteps between different schedulers: sometimes they're int, sometimes they're float, sometimes they're descending, sometimes ascending. #637 addresses this by allowing the schedulers to have their internal notion of schedule (or sigmas, or whatever they choose), while only having integer descending timesteps` in the public API. This makes debugging the pipelines way easier (i.e. timesteps no longer jump all over the place, we only have indices/steps), while also ensuring that literally any scheduler (with further refactoring for our VP, VE and Karras) is usable with e.g. Stable Diffusion.

I agree that #637 turned out to be bulky, so hopefully we can meet somewhere in the middle after iterating a bit.

anton-l · 2022-10-04T08:44:19Z

src/diffusers/schedulers/scheduling_lms_discrete.py

@@ -178,7 +179,8 @@ def step(
            When returning a tuple, the first element is the sample tensor.

        """
-        sigma = self.sigmas[timestep]
+        index = (self.config.num_train_timesteps - timestep) // (self.config.num_train_timesteps // self.num_inference_steps)


Since timestep is float (don't mind the wrong type annotation and doc for now), I think the surest way to get its index is self.timesteps.where(timesteps), as timesteps are linearly interpolated here.
But where() just looks like a hack, while we can refactor the timesteps and scheduler properly.

patrickvonplaten · 2022-10-04T09:15:49Z

@anton-l :

RE:

Putting the schedulers into continuous and discrete categories is a bit restrictive IMO

-> I don't understand this. All of our schedulers belong always to one of the two classes no? The math doesn't have to be perfect there it's just important that people understand what timestep representation is expected. We can also find a better name

With this design we still have a very different representation of timesteps between different schedulers: sometimes they're int, sometimes they're float, sometimes they're descending, sometimes ascending.

-> I've seen only two representations so far continous and int and all of these should be called timesteps IMO. The very simple proposition here is simply that the format is always the same when passed to step and the step function takes care of converting it to the correct format

patrickvonplaten · 2022-10-04T09:32:35Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

-                latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample
-            else:
-                latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
+            latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample


Could be considered a bug correction IMO

patrickvonplaten · 2022-10-04T09:45:22Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

-                # the model input needs to be scaled to match the continuous ODE formulation in K-LMS
-                latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+
+            if self.scheduler.type == SchedulerType.CONTINUOUS and self.model.config.trained_scheduler_type == SchedulerType.DISCRETE:


Suggested change

if self.scheduler.type == SchedulerType.CONTINUOUS and self.model.config.trained_scheduler_type == SchedulerType.DISCRETE:

if self.scheduler.type == SchedulerType.CONTINUOUS and self.model.config.trained_scheduler_type == SchedulerType.DISCRETE:

Here I'm very open to hear better suggestions @anton-l (if you can think about something that doesn't require changes to DDIM or DDPM but that's cleaner)

keturn · 2022-10-04T21:56:38Z

There's a lot of back-and-forth about that timestamps list and what its internal representation is and what the interfaces to it look like—

but is it necessary for a Scheduler to expose that list at all?
What benefits does that provide?

There might be some. I'm not sure the current pipelines need them at all. As far as I can tell, the only thing they need to know is the t of the current step so they can pass it to model.forward.

It also embodies the assumption that there is a fixed schedule known up front, which doesn't need to be the case. For example https://github.com/LuChengTHU/dpm-solver implements a sampler that uses adaptive time steps— it just keeps going until it's close enough to 0.

That's the sort of thing I wasn't going to bring up in this early pass of the redesign, but I do so now because it seems like it might be more freeing than complicating.

patrickvonplaten · 2022-10-04T22:25:22Z

There's a lot of back-and-forth about that timestamps list and what its internal representation is and what the interfaces to it look like—

but is it necessary for a Scheduler to expose that list at all? What benefits does that provide?

There might be some. I'm not sure the current pipelines need them at all. As far as I can tell, the only thing they need to know is the t of the current step so they can pass it to model.forward.

It also embodies the assumption that there is a fixed schedule known up front, which doesn't need to be the case. For example https://github.com/LuChengTHU/dpm-solver implements a sampler that uses adaptive time steps— it just keeps going until it's close enough to 0.

That's the sort of thing I wasn't going to bring up in this early pass of the redesign, but I do so now because it seems like it might be more freeing than complicating.

Thanks for bringing it up! Regarding the current design I think we can assume though that timesteps can always be written as a list of values. Timesteps cannot be written as a list of values are exceptional for now IMO (these things can't be traced etc... so not sure how useful they are)

patrickvonplaten · 2022-10-04T22:25:53Z

Closing this PR in favor of: #719

keturn · 2022-10-04T22:44:28Z

(these things can't be traced etc... so not sure how useful they are)

What does traced mean in this context?

The only disadvantage I could think of to giving up a fixed number of iterations for something adaptive is if there was some JIT compiler doing some loop unrolling or parallelization or something. But CPython isn't that smart, and it's not parallelizable because each iteration depends on the output of the one previous.

patrickvonplaten · 2022-10-04T22:49:16Z

(these things can't be traced etc... so not sure how useful they are)

What does traced mean in this context?

The only disadvantage I could think of to giving up a fixed number of iterations for something adaptive is if there was some JIT compiler doing some loop unrolling or parallelization or something. But CPython isn't that smart, and it's not parallelizable because each iteration depends on the output of the one previous.

Yeah sorry I more or less meant this by "traced" - JAX won't like this and also not sure if ONNX would be happy about this

keturn · 2022-10-04T23:48:16Z

Oh, do we have to play by these JAX rules? Good to know.

…though I'm skeptical about the value of JIT-ing that whole inference loop, and even if you did want to, jax.lax.while_loop exists…

But we can leave that until later. No need to worry over how to implement the adaptive form of DPM-Solver before we're even able to implement the more predictable DPM-Solver-fast sampler.

patrickvonplaten · 2022-10-05T10:28:49Z

implement

We also don't have to play by the JAX rules at all ;-) We want to support JAX as soon as it runs on free Google colabs: googlecolab/colabtools#3009 (comment) so that we give users more power (8 TPUs are pretty powerful). That being said it doesn't mean that all PyTorch functionality will have to take into account how the mirror would work in JAX. It's totally fine if the two frameworks diverge.

More generally, it allows helps PyTorch optimization libraries like ONNX, TensorRT a lot though if all memory can be pre-allocated.

up

58d2c67

patrickvonplaten mentioned this pull request Oct 3, 2022

[Schedulers Refactoring] Phase 1: timesteps and scaling #637

Closed

patrickvonplaten changed the title ~~[SchedulerDesign] New scheduler design~~ [SchedulerDesign] Alternative scheduler design Oct 3, 2022

keturn reviewed Oct 3, 2022

View reviewed changes

anton-l reviewed Oct 4, 2022

View reviewed changes

patrickvonplaten commented Oct 4, 2022

View reviewed changes

patrickvonplaten closed this Oct 6, 2022

sayakpaul deleted the new_scheduler_design branch December 3, 2024 10:25

		if self.scheduler.type == SchedulerType.CONTINUOUS:
		latents = latents * self.scheduler.init_sigma

[SchedulerDesign] Alternative scheduler design #711

[SchedulerDesign] Alternative scheduler design #711

Uh oh!

Conversation

patrickvonplaten commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 3, 2022

Uh oh!

keturn Oct 3, 2022

Choose a reason for hiding this comment

Uh oh!

keturn Oct 3, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

keturn Oct 3, 2022

Choose a reason for hiding this comment

Uh oh!

anton-l left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anton-l Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

keturn commented Oct 4, 2022

Uh oh!

patrickvonplaten commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Oct 4, 2022

Uh oh!

keturn commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Oct 4, 2022

Uh oh!

keturn commented Oct 4, 2022

Uh oh!

patrickvonplaten commented Oct 5, 2022

Uh oh!

Uh oh!

patrickvonplaten commented Oct 3, 2022 •

edited

Loading

anton-l left a comment •

edited

Loading

patrickvonplaten commented Oct 4, 2022 •

edited

Loading

patrickvonplaten commented Oct 4, 2022 •

edited

Loading

keturn commented Oct 4, 2022 •

edited

Loading