support python < 3.9 & fix nan when pytorch < 2.5 & fix deepspeed train type mismatch #23

conansherry · 2025-01-11T12:40:22Z

support python < 3.9
change type list to List
fix nan when pytorch < 2.5
see huggingface/diffusers@01bd796
fix deepspeed train type mismatch
original code:
context_aware_representations = self.c_embedder(context_aware_representations)
fix code:
context_aware_representations = self.c_embedder(context_aware_representations.to(dtype=x.dtype))

kohya-ss · 2025-01-11T13:55:00Z

Thank you for this, I will merge this. However, please note that it is difficult to maintain compatibility in the future (and compatibility may be lost accidentally), because we do not have PyTorch 3.9 environment. If that case occurs, please send another PR to fix it.

Regarding 2, there is a comment in the diffusers' pull request, will it be okay with PyTorch 2.5.1? : huggingface/diffusers@01bd796#commitcomment-151162702

conansherry · 2025-01-11T16:33:46Z

Thank you for this, I will merge this. However, please note that it is difficult to maintain compatibility in the future (and compatibility may be lost accidentally), because we do not have PyTorch 3.9 environment. If that case occurs, please send another PR to fix it.

Regarding 2, there is a comment in the diffusers' pull request, will it be okay with PyTorch 2.5.1? : huggingface/diffusers@01bd796#commitcomment-151162702

yes, pytorch 2.5.1 is okay.

kohya-ss · 2025-01-12T05:25:03Z

hunyuan_model/models.py

-            attn_mask = torch.zeros((bs, 1, max_seqlen_q, max_seqlen_q), dtype=torch.bool, device=text_mask.device)
+            attn_mask = torch.zeros((bs, 1, max_seqlen_q), dtype=torch.bool, device=text_mask.device)

            # set attention mask with total_len
            for i in range(bs):
-                attn_mask[i, :, : total_len[i], : total_len[i]] = True
+                attn_mask[i, :, : total_len[i]] = True


This fix seems to result in the following error in PyTorch 2.5.1:

x = F.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask, dropout_p=drop_rate, is_causal=causal) RuntimeError: The expanded size of the tensor (24) must match the existing size (3) at non-singleton dimension 1. Target sizes: [3, 24, 2296, 2296]. Tensor sizes: [3, 1, 2296]

Could you please revert this fix? Without this fix, it would work with --split_attn on versions prior to PyTorch 2.5.1.

bmaltais · 2025-01-21T12:51:37Z

With the introduction of uv as the Python version and module manager, is there still a need to support Python 3.9? Supporting such an old version might not be necessary since uv can download and install the Python version specified in pyproject.toml. For instance, if musubi-tuner specifies Python 3.10 and this version isn't installed on your computer, uv will quickly and easily download and use Python 3.10. Therefore, working on supporting older Python releases might no longer be necessary.

秦柯 added 4 commits January 11, 2025 12:58

fix deepspeed train error

240416a

fix nan when pytorch < 2.5, reference diffusers

a0b52a8

support python < 3.9

f6f6080

support python < 3.9

20fad12

fix list error

b7361e7

kohya-ss reviewed Jan 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

support python < 3.9 & fix nan when pytorch < 2.5 & fix deepspeed train type mismatch #23

support python < 3.9 & fix nan when pytorch < 2.5 & fix deepspeed train type mismatch #23

Uh oh!

conansherry commented Jan 11, 2025

Uh oh!

kohya-ss commented Jan 11, 2025

Uh oh!

conansherry commented Jan 11, 2025

Uh oh!

kohya-ss Jan 12, 2025

Uh oh!

bmaltais commented Jan 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

support python < 3.9 & fix nan when pytorch < 2.5 & fix deepspeed train type mismatch #23

Are you sure you want to change the base?

support python < 3.9 & fix nan when pytorch < 2.5 & fix deepspeed train type mismatch #23

Uh oh!

Conversation

conansherry commented Jan 11, 2025

Uh oh!

kohya-ss commented Jan 11, 2025

Uh oh!

conansherry commented Jan 11, 2025

Uh oh!

kohya-ss Jan 12, 2025

Choose a reason for hiding this comment

Uh oh!

bmaltais commented Jan 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bmaltais commented Jan 21, 2025 •

edited

Loading