env: add knob to control SWA eviction interval#22645
env: add knob to control SWA eviction interval#22645happierpig wants to merge 1 commit intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable multiplier for the Sliding Window Attention (SWA) eviction interval through a new environment variable, allowing for better control over the trade-off between memory waste and eviction overhead. A review comment points out a potential logic error where the eviction condition might never be met if the calculated interval is 1, and provides a suggestion to handle this edge case while maintaining the requirement to skip the first decode batch index.
| # 2. Evict swa every window_size tokens to reduce the overhead. | ||
| if req.decode_batch_idx % sliding_window_size == 1: | ||
| # 2. Evict swa every eviction_interval tokens to reduce the overhead. | ||
| if req.decode_batch_idx % eviction_interval == 1: |
There was a problem hiding this comment.
The condition req.decode_batch_idx % eviction_interval == 1 will never be true if eviction_interval is 1. This can happen if page_size is 1 and the multiplier is set to a very small value. In this case, SWA eviction would never trigger during decoding, which contradicts the user's intent of increasing eviction frequency.
Additionally, the original logic implicitly skipped the decode_batch_idx == 0 case (as mentioned in the comment above), which should be preserved.
| if req.decode_batch_idx % eviction_interval == 1: | |
| if req.decode_batch_idx > 0 and req.decode_batch_idx % eviction_interval == (1 % eviction_interval): |
Motivation
SWA eviction only runs every sliding_window_size decode steps per request. Between evictions, each request holds up to 2 * sliding_window_size SWA tokens — double what the sliding window needs.
Adding a knob to trade-off between the token waste and per-forward overhead.
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci