Integrate input preprocessing into export flow

When we export multimodal LLMs we almost always have to call `processor.apply_chat_template` in HF transformers. That call under the hood will preprocess the multimodal inputs.

This is quite annoying during exporting since we can't export `processor.apply_chat_template` directly. One example is this logic: https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/processing_whisper.py#L69 it uses a lot of numpy.

Ideally we should ask transformers to write standard processor for all inputs in torch (tracked by https://github.com/huggingface/transformers/issues/40986). In the short term I think optimum-executorch should host some of the common processors like the one in whisper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate input preprocessing into export flow #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate input preprocessing into export flow #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions