Fix conversion of ltx_video models in bf16 format

It would be good to have possibility to run ltx video in BF16 format with OpenVINO and optimum-intel. I tried to convert the ltx video model in bf16 format in several ways, but it looks like, that I didn't get fully correct results with both.

The first way. I have loaded [Lightricks/LTX-Video](https://huggingface.co/Lightricks/LTX-Video) and save it in torch.bfloat16 format. Then I have converted the model with `OVLTXPipeline` API:
```
from diffusers import LTXPipeline
from optimum.intel import OVLTXPipeline

pipeline = LTXPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
pipeline.save_pretrained("./models/LTX-Video_hf_bf16/")

ov_model = OVLTXPipeline.from_pretrained("./models/LTX-Video_hf_fp16/", device="CPU")
ov_model.save_pretrained("./models/LTX-Video_ov_bf16/")
```
Transformer model of `LTXPipeline` `diffusion_pytorch_model.safetensors` has size ~3.7GB in bfloat16 type and ~7.5 GB in FP32 type. But .bin file of converted `transformer` model has weight ~7.5GB, it's like FP32 model. There is no mention of the BF16 format in XLM. It looks like optimum-intel incorrectly identifies the model type(probably here [openvino/utils.py](https://github.com/huggingface/optimum-intel/blob/main/optimum/exporters/openvino/utils.py#L384))

The scond way. I have tried to convert FP16 version of `LTX-Video` model. There are, for example, Lightricks/LTX-Video-0.9.7-dev , Lightricks/LTX-Video-0.9.8-13B-distilled, Lightricks/LTX-Video-0.9.5.
```
optimum-cli export openvino --model Lightricks/LTX-Video-0.9.7-dev  --task text-to-video  ./models/Lightricks/LTX-Video-0.9.7-dev 
take error:
```
I got error in conversion:
```
Traceback (most recent call last):
  File "./env/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py", line 72, in __init__
    pt_module = self._get_scripted_model(
  File "./env/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py", line 178, in _get_scripted_model
    scripted = torch.jit.trace(
  File "./env/lib/python3.10/site-packages/torch/jit/_trace.py", line 1002, in trace
    traced_func = _trace_impl(
  File "./env/lib/python3.10/site-packages/torch/jit/_trace.py", line 696, in _trace_impl
    return trace_module(
  File "./env/lib/python3.10/site-packages/torch/jit/_trace.py", line 1282, in trace_module
    module._c._create_method_from_trace(
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/convert.py", line 398, in ts_patched_forward
    outputs = patched_forward(**kwargs)
  File "./env/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py", line 596, in patched_forward
    outputs = self.orig_forward(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/convert.py", line 1035, in <lambda>
    vae_encoder.forward = lambda sample: {"latent_parameters": vae_encoder.encode(x=sample)["latent_dist"].parameters}
  File "./env/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "./env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_ltx.py", line 1276, in encode
    h = self._encode(x)
  File "./env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_ltx.py", line 1252, in _encode
    enc = self.encoder(x)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "./env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_ltx.py", line 866, in forward
    hidden_states = down_block(hidden_states)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "./env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_ltx.py", line 513, in forward
    hidden_states = downsampler(hidden_states)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "./env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1763, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "./env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_ltx.py", line 230, in forward
    .unflatten(2, (-1, self.stride[0]))
  File "./env/lib/python3.10/site-packages/torch/_tensor.py", line 1433, in unflatten
    return super().unflatten(dim, sizes)
RuntimeError: unflatten: Provided sizes [-1, 2] don't multiply up to the size of dim 2 (3) in the input tensor

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./env/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "./env/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 219, in main
    service.run()
  File "./env/lib/python3.10/site-packages/optimum/commands/export/openvino.py", line 469, in run
    main_export(
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/__main__.py", line 524, in main_export
    submodel_paths = export_from_model(
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/convert.py", line 740, in export_from_model
    export_models(
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/convert.py", line 509, in export_models
    export(
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/convert.py", line 211, in export
    return export_pytorch(
  File "./env/lib/python3.10/site-packages/optimum/exporters/openvino/convert.py", line 416, in export_pytorch
    ts_decoder = TorchScriptPythonDecoder(model, example_input=dummy_inputs, **ts_decoder_kwargs)
  File "./env/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py", line 84, in __init__
    raise RuntimeError(
RuntimeError: Couldn't get TorchScript module by tracing.
Exception:
unflatten: Provided sizes [-1, 2] don't multiply up to the size of dim 2 (3) in the input tensor
Please check correctness of provided 'example_input'. Sometimes models can be converted in scripted mode, please try running conversion without 'example_input'.
 You can also provide TorchScript module that you obtained yourself, please refer to PyTorch documentation: https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html.
```

cc @sbalandi, @rkazants 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix conversion of ltx_video models in bf16 format #1614

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix conversion of ltx_video models in bf16 format #1614

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions