Misaligned AOTI input; potential perf gains by fixing? #1424
Open
Description
🐛 Describe the bug
Picked up in #1367, and worked around via pytorch/pytorch#143236, it appears the input to the torchchat AOTI runner is not 16 byte aligned.
While the PR from pytorch/pytorch eases this constraint, this may be indicative of potential perf losses (common of misalignment)
hattip to @malfet for suggesting line of investigation