Open
Description
🚀 The feature, motivation and pitch
After an ExecuTorch model is exported to a pte
, tokenization information must be passed in as an arg (-l <#>
) to the runner. This can be avoided by writing this information into the pte
file itself since the tokenizer is known at export time (sentencepiece => 2, tiktoken =>3). Tokenization information can be stored during export as a constant_method.
For example: https://github.com/pytorch/torchchat?tab=readme-ov-file#deploy-and-run-on-android
cmake-out/et_run llama3.1.pte -z `python3 torchchat.py where llama3.1`/tokenizer.model -l 3 -i "Once upon a time"
Task:
- Update ExecuTorch exporting to save tokenization information in the pte artifact
- Update the ExecuTorch runner to read the newly saved metadata
For a similar optimization made for aoti: #1159.
See #1439 for conversation/more context
Alternatives
Continue to pass tokenizer arguments to the runner
Additional context
No response
RFC (Optional)
No response