Move tokenizer information into pte to reduce ExecuTorch runner args

### 🚀 The feature, motivation and pitch

After an ExecuTorch model is exported to a `pte`, tokenization information must be passed in as an arg (`-l <#>`) to the runner. This can be avoided by writing this information into the `pte` file itself since the tokenizer is known at export time (sentencepiece => 2, tiktoken =>3). Tokenization information can be stored during export as a [constant_method](https://github.com/pytorch/executorch/blob/073397357118feef0fca91326ed612ce5c60d53b/exir/program/_program.py#L1188).

For example: https://github.com/pytorch/torchchat?tab=readme-ov-file#deploy-and-run-on-android
```
cmake-out/et_run llama3.1.pte -z `python3 torchchat.py where llama3.1`/tokenizer.model -l 3 -i "Once upon a time"
```
---
**Task:**
1) Update ExecuTorch exporting to save tokenization information in the pte artifact
2) Update the ExecuTorch runner to read the newly saved metadata


For a similar optimization made for aoti: https://github.com/pytorch/torchchat/pull/1159.
See https://github.com/pytorch/torchchat/pull/1439 for conversation/more context



### Alternatives

Continue to pass tokenizer arguments to the runner

### Additional context

_No response_

### RFC (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move tokenizer information into pte to reduce ExecuTorch runner args #1484

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development