-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
When we export multimodal LLMs we almost always have to call processor.apply_chat_template in HF transformers. That call under the hood will preprocess the multimodal inputs.
This is quite annoying during exporting since we can't export processor.apply_chat_template directly. One example is this logic: https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/processing_whisper.py#L69 it uses a lot of numpy.
Ideally we should ask transformers to write standard processor for all inputs in torch (tracked by huggingface/transformers#40986). In the short term I think optimum-executorch should host some of the common processors like the one in whisper.
Metadata
Metadata
Assignees
Labels
No labels