Is your feature request related to a problem? Please describe.
The smart_lastrec works by matching a regex during the prefill, for each batch position. The protected stretch must be a prefix.
One problem is left-padding if sequence are very different in length. It then happens that the prefix for some batch positions mostly contains pad tokens -- this is wasteful.
Better would be to define a "protected stretch", which starts from the first non-pad token and extends until the regex matches (or max size). Importantly, this stretch starts at different token positions across the batch. Slots left of the protected stretch are evicted.
In extreme cases, the complete prefill chunk can be padding for some sequences in a batch. In this case, we could delay fixing the protected stretch until later. How difficult is this?
Is your feature request related to a problem? Please describe.
The
smart_lastrecworks by matching a regex during the prefill, for each batch position. The protected stretch must be a prefix.One problem is left-padding if sequence are very different in length. It then happens that the prefix for some batch positions mostly contains pad tokens -- this is wasteful.
Better would be to define a "protected stretch", which starts from the first non-pad token and extends until the regex matches (or max size). Importantly, this stretch starts at different token positions across the batch. Slots left of the protected stretch are evicted.
In extreme cases, the complete prefill chunk can be padding for some sequences in a batch. In this case, we could delay fixing the protected stretch until later. How difficult is this?