Skip to content

Conversation

@conansherry
Copy link

  1. support python < 3.9
    change type list to List
  2. fix nan when pytorch < 2.5
    see huggingface/diffusers@01bd796
  3. fix deepspeed train type mismatch
    original code:
    context_aware_representations = self.c_embedder(context_aware_representations)
    fix code:
    context_aware_representations = self.c_embedder(context_aware_representations.to(dtype=x.dtype))

@kohya-ss
Copy link
Owner

Thank you for this, I will merge this. However, please note that it is difficult to maintain compatibility in the future (and compatibility may be lost accidentally), because we do not have PyTorch 3.9 environment. If that case occurs, please send another PR to fix it.

Regarding 2, there is a comment in the diffusers' pull request, will it be okay with PyTorch 2.5.1? : huggingface/diffusers@01bd796#commitcomment-151162702

@conansherry
Copy link
Author

Thank you for this, I will merge this. However, please note that it is difficult to maintain compatibility in the future (and compatibility may be lost accidentally), because we do not have PyTorch 3.9 environment. If that case occurs, please send another PR to fix it.

Regarding 2, there is a comment in the diffusers' pull request, will it be okay with PyTorch 2.5.1? : huggingface/diffusers@01bd796#commitcomment-151162702

yes, pytorch 2.5.1 is okay.

Comment on lines -759 to +763
attn_mask = torch.zeros((bs, 1, max_seqlen_q, max_seqlen_q), dtype=torch.bool, device=text_mask.device)
attn_mask = torch.zeros((bs, 1, max_seqlen_q), dtype=torch.bool, device=text_mask.device)

# set attention mask with total_len
for i in range(bs):
attn_mask[i, :, : total_len[i], : total_len[i]] = True
attn_mask[i, :, : total_len[i]] = True
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix seems to result in the following error in PyTorch 2.5.1:

x = F.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask, dropout_p=drop_rate, is_causal=causal)
RuntimeError: The expanded size of the tensor (24) must match the existing size (3) at non-singleton dimension 1.  Target sizes: [3, 24, 2296, 2296].  Tensor sizes: [3, 1, 2296]

Could you please revert this fix? Without this fix, it would work with --split_attn on versions prior to PyTorch 2.5.1.

@bmaltais
Copy link
Contributor

bmaltais commented Jan 21, 2025

With the introduction of uv as the Python version and module manager, is there still a need to support Python 3.9? Supporting such an old version might not be necessary since uv can download and install the Python version specified in pyproject.toml. For instance, if musubi-tuner specifies Python 3.10 and this version isn't installed on your computer, uv will quickly and easily download and use Python 3.10. Therefore, working on supporting older Python releases might no longer be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants