tracker: move prepare_inputs_for_generation
into the generation mixin 🧹
#32685
Labels
Generation
WIP
Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
🧹 This is a tracker regarding the move of
prepare_inputs_for_generation
into the generation mixin 🧹Why?
prepare_inputs_for_generation
is not part of the core modeling, but rather a utility forgenerate
generate
changes. Fewer modeling changes -> improved model stabilityTracker
Kinda ordered list of tasks:
llama
,generate
, andcache_utils
[except sink cache, broken atm] slow tests should be passing to ensure we don’t break anything (Llama: make slow tests green 🟢 #33138)PreTrainedModel
doesn't inherit fromGenerationMixin
, so thatcan_generate()
becomes independent ofprepare_inputs_for_generation
being overwritten or not (Generation: deprecatePreTrainedModel
inheriting fromGenerationMixin
#33203)prepare_inputs_for_generation
to the generation mixin. This implies moving one function that prepares the 4D mask too (the one that is called there) (Generate: move llamaprepare_inputs_for_generation
toGenerationMixin
#33677)prepare_inputs_for_generation
— currently we don’t test it directly, and we shouldsynced_gpus
ingenerate
: whensynced_gpus
andcache_positions
is out of bounds, take the latest availableinput_ids
for dummy computations (see Fix synced GPUs #33252; should fix Multi GPU generate with llama shape error #32885, Shape mismatch when generating with multiple processes #32603, and Bugfix for generation with an early-stopping process #32641)prepare_inputs_for_generation
from as many models as possible. There may be merge conflicts here, due to the 4D mask function. Try to iron out as many trivial cases as possibleprepare_inputs_for_generation
to forward**kwargs
from its input to its output. With minimal changes, this should enable most VLMs to use the shared function (they forwardpixel_values
from the input to the output)prepare_inputs_for_generation
should have been removed 🤗 We would need to check the others individually, there may be further simplification patterns available!The text was updated successfully, but these errors were encountered: