[Schedulers Refactoring] Phase 1: timesteps and scaling #637

anton-l · 2022-09-26T01:29:23Z

As the refactor in #616 started getting too big to review in one go, I decided to approach this in a couple of smaller PRs that touch as few files as possible.

Note: it's easier to start reviewing from pipeline_stable_diffusion.py, as it shows the reasoning behind the API changes better.

Basic ideas that this PR implements:

scheduler.timesteps are now always just integer (descending?) indices in range [0, num_inference_steps) in inference mode, e.g. [49, 48,...,1, 0]. This makes sure that we always iterate over the same range for every scheduler.
scheduler.schedule replaces the original timesteps and contains either discrete noise conditions (t or sigma) for the model (like in DDIM: [999, 978, ... 20, 0]) or the resampled ones (like in LMS: [999.0, 977.7, ... 0.0])
scheduler.scale_initial_noise() scales the initial torch.randn, as sometimes (e.g. in Karras, LMS or VE schedulers) the initial noise is not N(0, 1), but rather in N(0, max_sigma^2). This function has to be applied after sampling the noise in the pipeline.
scheduler.scale_model_input(sample, step) has to be applied for each UNet input sample, as sometimes the inputs need to be scaled (e.g. for Karras or LMS)
scheduler.get_noise_condition(step) gets the noise condition (t or sigma) for the UNet. Sometimes the t needs to be scaled (e.g. in Karras+Euler), so this is implemented as a future-proof function that can access the scheduler's parameters.
scheduler.step() must always accept a value from scheduler.timesteps (rather than scheduler.schedule) as input, so that we can use it as index for schedule, sigmas or whatever the scheduler needs for the step.

TODO: merge the Pytorch schedulers and rebase these changes on top

Coming up in the next PRs (Phase 2+):

t and sigma interchangeability to use any continuous scheduler with a discrete model and vise-versa
better (readable/intuitive) support for Nth order solvers that require combining multiple forward passes of the UNet

anton-l · 2022-09-26T01:31:27Z

src/diffusers/schedulers/scheduling_utils.py

+class BaseScheduler(abc.ABC):
+
+    def scale_initial_noise(self, noise: torch.FloatTensor):
+        """
+        Scales the initial noise to the correct range for the scheduler.
+        """
+        return noise
+
+    def scale_model_input(self, sample: torch.FloatTensor, step: int):
+        """
+        Scales the model input (`sample`) to the correct range for the scheduler.
+        """
+        return sample
+
+    @abc.abstractmethod
+    def get_noise_condition(self, step: int):
+        """
+        Returns the input noise condition for the model (e.g. `timestep` or `sigma`).
+        """
+        raise NotImplementedError("Scheduler must implement the `get_noise_condition` function.")
+
+


This class combines the new required functions and ideally should be merged with SchedulerMixin (left standalone for easier reviewing for now).

Yes, I agree to have it merged; otherwise schedulers need to inherit from both.

HuggingFaceDocBuilderDev · 2022-09-26T01:32:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

anton-l · 2022-09-26T13:51:40Z

Not touching the VE schedulers yet, they will follow when we have convertable t<->sigmas.
But overall: ready for review!

anton-l · 2022-09-26T13:53:05Z

src/diffusers/schedulers/scheduling_lms_discrete.py

+        sigmas = ((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5
+        sigmas = sigmas[::-1].copy()
+        self.sigmas = np.concatenate([sigmas, [0.0]]).astype(np.float32)


src/diffusers/dependency_versions_table.py

pcuenca · 2022-09-27T09:36:46Z

src/diffusers/schedulers/scheduling_utils.py

+class BaseScheduler(abc.ABC):
+
+    def scale_initial_noise(self, noise: torch.FloatTensor):
+        """
+        Scales the initial noise to the correct range for the scheduler.
+        """
+        return noise
+
+    def scale_model_input(self, sample: torch.FloatTensor, step: int):
+        """
+        Scales the model input (`sample`) to the correct range for the scheduler.
+        """
+        return sample
+
+    @abc.abstractmethod
+    def get_noise_condition(self, step: int):
+        """
+        Returns the input noise condition for the model (e.g. `timestep` or `sigma`).
+        """
+        raise NotImplementedError("Scheduler must implement the `get_noise_condition` function.")
+
+


Yes, I agree to have it merged; otherwise schedulers need to inherit from both.

pcuenca

Initially I thought the changes would result in more verbose code in the pipelines, but then I realized they just replace special cases with function calls, and it's much clearer this way.

I just left a few comments about details I may not be understanding properly.

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

pcuenca · 2022-09-27T09:49:14Z

src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py

@@ -130,6 +130,7 @@ def __call__(
            generator=generator,
        )
        latents = latents.to(self.device)
+        latents = self.scheduler.scale_initial_noise(latents)


Do schedulers expect the latents in the final device or in CPU? As per this PR it looks like LMSDiscreteScheduler prepares sigmas in the CPU.

We could either:

Always follow the tensor, and move the sigmas etc. if necessary.

Implement scheduler.to() as @anton-l suggested in that PR.

pcuenca · 2022-09-27T09:51:12Z

src/diffusers/schedulers/scheduling_utils.py

@@ -37,6 +38,27 @@ class SchedulerOutput(BaseOutput):
    prev_sample: torch.FloatTensor


+class BaseScheduler(abc.ABC):


Agree to merge it with SchedulerMixin if we can.

Can't remove SchedulerMixin completely yet, as it's used by the Flax schedulers, so I guess we'll address it after the Flax refactoring

pcuenca · 2022-09-27T09:57:05Z

src/diffusers/schedulers/scheduling_utils.py

+        return sample
+
+    @abc.abstractmethod
+    def get_noise_condition(self, step: int):


I didn't understand why this was different to step() but then I read @anton-l's comment in the description. Sounds good!

The function is not intuitive for me to be honest. We pass a step variable to a get_noise_condition to then pass the output to a step(...) function and all this represents for most of our schedulers a timestep

pcuenca · 2022-09-27T10:00:33Z

src/diffusers/schedulers/scheduling_utils.py

+        """
+        Returns the input noise condition for the model (e.g. `timestep` or `sigma`).
+        """
+        raise NotImplementedError("Scheduler must implement the `get_noise_condition` function.")


However, should we default it to invoking step() rather than raising an exception? Then only the schedulers that need it would implement it. If you are reading the code for current schedulers, there's one thing less to worry about.

anton-l · 2022-09-27T19:48:30Z

Resolved the merge conflicts, ready for review again :)

* Fix the LMS pytorch regression * Copy over the changes from #637 * Copy over the changes from #637 * Fix betas test

patrickvonplaten · 2022-09-30T15:34:49Z

src/diffusers/pipelines/ddim/pipeline_ddim.py

            # 1. predict noise model_output
+            t = self.scheduler.get_noise_condition(step)


it's confusing to me that get_noise_condition( ) return a time integer. Also why is this needed for DDIM?

patrickvonplaten · 2022-09-30T15:37:53Z

src/diffusers/schedulers/scheduling_ddim.py

@@ -155,7 +155,8 @@ def __init__(

        # setable values
        self.num_inference_steps = None
-        self.timesteps = np.arange(0, num_train_timesteps)[::-1]
+        self.schedule = np.arange(0, num_train_timesteps)


What do we need the schedule for?

patrickvonplaten · 2022-09-30T15:38:02Z

src/diffusers/schedulers/scheduling_ddim.py

+        """
+        Returns the input noise condition for a model.
+        """
+        return self.schedule[step]


This function is not intuitive for me

patrickvonplaten · 2022-09-30T15:39:57Z

src/diffusers/schedulers/scheduling_ddim.py

@@ -240,6 +249,8 @@ def step(
        # - pred_sample_direction -> "direction pointing to x_t"
        # - pred_prev_sample -> "x_t-1"

+        timestep = self.schedule[timestep]


Here I'm a bit lost - the code now here is more complex than it was before

Why are we now passing a different timestep to the scheduler than before that we then revert here?

patrickvonplaten · 2022-09-30T15:41:52Z

src/diffusers/schedulers/scheduling_lms_discrete.py

@@ -176,6 +197,8 @@ def step(
            When returning a tuple, the first element is the sample tensor.

        """
+        # FIXME: accounting for the descending sigmas
+        timestep = int(len(self.timesteps) - timestep - 1)


I don't understand this -> we are reverting timestep here but give it the same name

This is hard to understand / read

patrickvonplaten · 2022-09-30T15:43:10Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

@@ -46,6 +46,14 @@ class StableDiffusionPipeline(DiffusionPipeline):
            Model that extracts features from generated images to be used as inputs for the `safety_checker`.
    """

+    vae: AutoencoderKL


this is a bit unrelated to the PR - let's try to put this in a different PR next time ;-)

patrickvonplaten · 2022-09-30T15:43:25Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

@@ -230,14 +238,11 @@ def __call__(
            if latents.shape != latents_shape:
                raise ValueError(f"Unexpected latents shape, got {latents.shape}, expected {latents_shape}")
        latents = latents.to(self.device)
+        latents = self.scheduler.scale_initial_noise(latents)


Fine with this function! Think it would also be fine to add an if-statement here (to only do it if the scheduler is continous)

I don't think this works correctly at the moment because the sigmas are changed when doing self.scheduler.set_timesteps

patrickvonplaten · 2022-09-30T15:44:49Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

-                # the model input needs to be scaled to match the continuous ODE formulation in K-LMS
-                latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+            latent_model_input = self.scheduler.scale_model_input(latent_model_input, step)
+            t = self.scheduler.get_noise_condition(step)


Here I'm lost -> this function is not intuitive to me. t != noise_condition for me. Not happy if we need to force schedulers to have this function

patrickvonplaten · 2022-09-30T15:46:13Z

src/diffusers/schedulers/scheduling_lms_discrete.py

+        """
+        Returns the input noise condition for a model.
+        """
+        return self.schedule[step]


No one-liner functions please

patrickvonplaten · 2022-09-30T15:48:17Z

src/diffusers/schedulers/scheduling_utils.py

@@ -36,6 +37,38 @@ class SchedulerOutput(BaseOutput):
    prev_sample: torch.FloatTensor


+class BaseScheduler(abc.ABC):


Suggested change

class BaseScheduler(abc.ABC):

class BaseScheduler:

patrickvonplaten · 2022-09-30T15:49:55Z

src/diffusers/schedulers/scheduling_utils.py

@@ -36,6 +37,38 @@ class SchedulerOutput(BaseOutput):
    prev_sample: torch.FloatTensor


+class BaseScheduler(abc.ABC):


Not super happy about this. If some schedulers don't need a scale_initial_noise function we shouldn't force them to call it to get back a no-op, same with scale_model_input

patrickvonplaten

To be honest, I'm not super convinced by this design:

Don't think we should force schedulers to call functions if they don't need to. Instead we should raise an error if a function is not called for a scheduler. This is also better for backwards comp
Currently, I don't think the BaseScheduler provides much value and I would not be in favor of adding it. The functions: scale_initial_noise and scale_model_input are not required by DDIM or PNDM but just K-LMS. Just because K-LMS needs them doesn't warrant to add them as no-ops to DDIM and PNDM IMO. I think it's not good design to say "The default we put in a class every scheduler inherits from is to not scale and then schedulers that need to scale have to overwrite the method". This means if I add a new continuous scheduler now and forget to add scale_initial_noise then I'll get silent errors. Instead I want to get a big error
I think we're changing too much with too little gains here. We're still starting from a design we have chosen and should try to reduce the mental energy it takes to go from the current design to the new design. Here we introduce a lot of new variables, function names:
self.schedule, scale_initial_noise, scale_model_input, get_noise_condition, the meaning of step is redefined, sometimes we pass step to timesteps sometimes to schedule => This is too much. Let's try to limit the new things to be learnt.
Ideas:
I think we should focus much more on global differences between schedulers and derive simple logic from there. Why don't we just give each scheduler a class variable "discrete" or "continous" (I think every scheduler has to be one of the two). We can enforce this by checking in SchedulerMixin that every scheduler needs to have exactly one of the two. Then depending on whether it's "continous" or "discrete" we scale inputs or not. We can throw nice error messages in the step(...) function of the scheduler and the model forward that complain if variables are passed in the incorrect space.
I don't think we really need to change DDIM or PNDM's functionality, we just need to adapt K-LMS
I think it would make a lot of sense to also add a "trained_on_continous" and "trained_on_discrete" to the model's config so that we can also throw a nice error in the model's forward if the wrong dtype is passed

keturn · 2022-09-30T16:42:23Z

There were some things here that had me scratching my head a bit too, but

I don't understand the differences between the continuous and discrete approaches well enough. Maybe I should try reading Kerras's Elucidating again? Or is there an example of a scheduler with the same underlying math written in both styles that I could read to compare?
I was extending significant credit on the basis of this being a "part one of two" PR, expecting that fields or methods that seem trivial now will get extended in interesting ways in Part Two.

keturn · 2022-09-30T16:45:38Z

a possibly-related thread on sampler/scheduler interface design is crowsonkb/k-diffusion#23

patrickvonplaten · 2022-10-03T19:00:09Z

Alternative to this design: #711

anton-l · 2022-10-05T10:34:55Z

Superseded by #719

* Fix the LMS pytorch regression * Copy over the changes from huggingface#637 * Copy over the changes from huggingface#637 * Fix betas test

[Schedulers Refactoring] Phase 1: timesteps and scaling

c63e6e8

anton-l requested review from kashif, natolambert and patrickvonplaten September 26, 2022 01:29

anton-l commented Sep 26, 2022

View reviewed changes

anton-l requested a review from patil-suraj September 26, 2022 01:38

anton-l added 5 commits September 26, 2022 12:24

cover more schedulers, fix onnxpipeline

556e687

style

ccc6afb

fixed all tests

ba351f5

style

f4e717e

fix scheduler tests

f58846d

anton-l commented Sep 26, 2022

View reviewed changes

anton-l mentioned this pull request Sep 26, 2022

[Pytorch] Pytorch only schedulers #534

Merged

pcuenca reviewed Sep 27, 2022

View reviewed changes

patrickvonplaten self-assigned this Sep 27, 2022

anton-l added 5 commits September 27, 2022 15:42

Merge main

1d7a9fc

Fix tests after merging

0d0395b

Merge main

90b1aaa

Merge remote-tracking branch 'origin/main' into scheduler-refactor-v2

606df49

Fix numerical issues introduced with pytorch

9cfd2dc

pcuenca mentioned this pull request Sep 28, 2022

Add callback parameters for Stable Diffusion pipelines #521

Merged

anton-l mentioned this pull request Sep 28, 2022

Fix the LMS pytorch regression #664

Merged

anton-l added a commit that referenced this pull request Sep 28, 2022

Copy over the changes from #637

0a682b0

anton-l added a commit that referenced this pull request Sep 28, 2022

Copy over the changes from #637

f903db0

anton-l added a commit that referenced this pull request Sep 28, 2022

Fix the LMS pytorch regression (#664)

765506c

* Fix the LMS pytorch regression * Copy over the changes from #637 * Copy over the changes from #637 * Fix betas test

patrickvonplaten reviewed Sep 30, 2022

View reviewed changes

patrickvonplaten mentioned this pull request Oct 3, 2022

[SchedulerDesign] Alternative scheduler design #711

Closed

anton-l closed this Oct 5, 2022

AbdullahAlfaraj mentioned this pull request Oct 11, 2022

New Scheduler: add Euler Ancestral Scheduler to StableDiffusionPipeline #636

Closed

3 tasks

prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022

Fix the LMS pytorch regression (huggingface#664)

7659455

* Fix the LMS pytorch regression * Copy over the changes from huggingface#637 * Copy over the changes from huggingface#637 * Fix betas test

anton-l deleted the scheduler-refactor-v2 branch November 17, 2022 14:54

		@@ -37,6 +38,27 @@ class SchedulerOutput(BaseOutput):
		prev_sample: torch.FloatTensor


		class BaseScheduler(abc.ABC):

		# 1. predict noise model_output
		t = self.scheduler.get_noise_condition(step)

		@@ -36,6 +37,38 @@ class SchedulerOutput(BaseOutput):
		prev_sample: torch.FloatTensor


		class BaseScheduler(abc.ABC):

[Schedulers Refactoring] Phase 1: timesteps and scaling #637

[Schedulers Refactoring] Phase 1: timesteps and scaling #637

Uh oh!

Conversation

anton-l commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 26, 2022

Uh oh!

anton-l commented Sep 26, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anton-l commented Sep 27, 2022

Uh oh!

patrickvonplaten Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keturn commented Sep 30, 2022

Uh oh!

keturn commented Sep 30, 2022

Uh oh!

patrickvonplaten commented Oct 3, 2022

Uh oh!

anton-l commented Sep 26, 2022 •

edited

Loading

patrickvonplaten Sep 30, 2022 •

edited

Loading

patrickvonplaten Sep 30, 2022 •

edited

Loading

patrickvonplaten Sep 30, 2022 •

edited

Loading

patrickvonplaten left a comment •

edited

Loading