-
Notifications
You must be signed in to change notification settings - Fork 6.5k
DO NOT MERGE - CI sandbox for stateless scheduler b temp run #27667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fzyzcjy
wants to merge
322
commits into
tom/extend-logprob-start-len-free-fn
Choose a base branch
from
tom/stateless_scheduler_b_temp_run
base: tom/extend-logprob-start-len-free-fn
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
322 commits
Select commit
Hold shift + click to select a range
72a4d36
Refactor: Inline retract_all, delete the function
fzyzcjy 94ea0fc
Refactor: Minimize batch_result_processor diff vs pre-refactor
fzyzcjy 8b6f4d2
Refactor: Simplify pause_generation(retract) chunked release
fzyzcjy c25d10d
Refactor: Replace defensive only_decode_ready filters with asserts
fzyzcjy 61f94e3
Refactor: Drop obsolete PP cross-mb idempotency guard in _handle_fini…
fzyzcjy 6f0e4a9
Refactor host_hit_length reuse skip to explicit is_resume branch
fzyzcjy efffdef
Restore main-upstream add_chunked_req as temporary alias for dispatch…
fzyzcjy 3d3f8ec
Split add_one_req into add_first_chunk_req / add_non_first_chunk_req …
fzyzcjy 653b585
Clean is_resume residue from add_first_chunk_req
fzyzcjy fdc0efc
Adapt add_non_first_chunk_req to dev-f convention
fzyzcjy 94ff96a
Add return type annotation to add_first_chunk_req
fzyzcjy 5a75498
Apply black formatting to add_first_chunk_req signature
fzyzcjy c2d7f2f
Move add_non_first_chunk_req to match main-upstream's add_chunked_req…
fzyzcjy ff77a84
Replace Req.fill_ids array with derived fill_len
fzyzcjy b37ddd3
Minimize add_non_first_chunk_req diff vs main-upstream add_chunked_req
fzyzcjy 6c89024
Tweak stale comment wording: 'truncates' -> 'shrinks' for fill_len
fzyzcjy 12c662e
Apply black reformat
fzyzcjy 0242fe0
Rename Req helpers for clarity
fzyzcjy 19a48a9
Apply black reformat
fzyzcjy 033174f
Make DLLM fill_len single-phase
fzyzcjy 894bfdf
Revert fill_ids derive-only refactor
fzyzcjy 9154b65
Refactor Req.fill_ids into (full_untruncated_fill_ids, fill_len)
fzyzcjy b1afd8d
Apply black reformat
fzyzcjy 8890a1f
Update stale fill_ids references in test docstrings
fzyzcjy 13d6ec5
Preserve OLD reset_for_retract behavior: don't clear fill_ids state
fzyzcjy 9c5c20a
Use get_fill_ids() for the 3 reads in init/prefetch/DLLM-phase
fzyzcjy 378c667
Restrict fill_len to truncated/committed semantics
fzyzcjy 16f8ec8
Revert 3 reads to full_untruncated_fill_ids under PR2 semantics
fzyzcjy 0dd2580
Allow PR test and lint workflows to trigger on non-main bases
fzyzcjy 6a05dd4
Merge tom/ci_unblock_chain_pr_test into tom/refactor_retract_all
fzyzcjy bda5d5a
Merge tom/refactor_retract_all into tom/release_req_free_func
fzyzcjy 33a861d
Merge tom/release_req_free_func into tom/refactor_fill_ids
fzyzcjy ac54584
Merge tom/refactor_fill_ids into tom/refactor_fill_ids_b
fzyzcjy 9b1832b
Write fill_len in non-chunked add_one_req admission
fzyzcjy 35b6d20
Use len(full_untruncated_fill_ids) + assert equivalence
fzyzcjy 84cc52b
Write fill_len in add_one_req_ignore_eos non-chunked admission
fzyzcjy b8be787
Merge upstream/main into tom/refactor_retract_all
fzyzcjy 6bde970
Merge tom/refactor_retract_all into tom/release_req_free_func
fzyzcjy 86737d8
Merge tom/release_req_free_func into tom/refactor_fill_ids
fzyzcjy 6ca7496
Translate new main fill_ids usage to fill_len in _compute_chunked_req…
fzyzcjy c9021a1
Merge tom/refactor_fill_ids into tom/refactor_fill_ids_b
fzyzcjy 39a44ef
Add include_parallel_rank_in_filename option to dumper
fzyzcjy 6052c6e
Remap pipeline-local layer indices to global in dump_model
fzyzcjy decc7fb
Merge tom/dumper_0531 into tom/refactor_retract_all
fzyzcjy e6a0837
Merge tom/refactor_retract_all into tom/release_req_free_func
fzyzcjy 3bb5798
Merge tom/release_req_free_func into tom/refactor_fill_ids
fzyzcjy 8501119
Merge tom/refactor_fill_ids into tom/refactor_fill_ids_b
fzyzcjy 25f1175
Speed up dump comparator percentile computation using numpy
fzyzcjy 35a0721
Merge tom/opt_dump_comparator_percentile into tom/refactor_retract_al…
fzyzcjy 28bc9d9
Merge tom/refactor_retract_all into tom/release_req_free_func (chain)
fzyzcjy 4978356
Merge tom/release_req_free_func into tom/refactor_fill_ids (chain)
fzyzcjy a1cfa46
Merge tom/refactor_fill_ids into tom/refactor_fill_ids_b (chain)
fzyzcjy 24f6db3
Merge branch 'main' of github.com:sgl-project/sglang
fzyzcjy 7d6f14f
Merge tom/dumper_0531 into tom/opt_dump_comparator_percentile (chain)
fzyzcjy 6c6e18a
Merge tom/opt_dump_comparator_percentile into tom/refactor_retract_al…
fzyzcjy 1451113
Merge tom/refactor_retract_all into tom/release_req_free_func (chain)
fzyzcjy 32cc317
Merge tom/release_req_free_func into tom/refactor_fill_ids (chain)
fzyzcjy 13027ac
Merge tom/refactor_fill_ids into tom/refactor_fill_ids_b (chain)
fzyzcjy 6589689
Replace _chunked_req_scheduled_last_iter flag with content-based stas…
fzyzcjy 787af65
Merge upstream/main into feat/stateless_scheduler_b
fzyzcjy 86b2367
Remove obsolete imports after upstream merge
fzyzcjy 67e2426
Merge PR #26850 (parallel-rank dump filenames + pipeline-global layer…
fzyzcjy b4558f1
Merge PR #26874 (speed up dump comparator percentile with numpy)
fzyzcjy b41b275
Merge PR #26547 (avoid filter_batch with unrelated chunked_req_to_exc…
fzyzcjy c20bfb0
Merge PR #26548 (extract release_req and retract_all as module-level …
fzyzcjy 8244293
Merge PR #26637 (refactor Req.fill_ids into full_untruncated_fill_ids…
fzyzcjy b07412f
Merge PR #26659 (restrict Req fill_len to truncated/committed semantics)
fzyzcjy c71bdc1
Merge PR #26938 (content-based stash gate for chunked req)
fzyzcjy 0ed8b6c
Merge upstream/main into tom/dumper_0531
fzyzcjy a2e0622
Merge tom/dumper_0531 into tom/opt_dump_comparator_percentile (chain)
fzyzcjy 1c2bd6d
Merge tom/opt_dump_comparator_percentile into tom/refactor_retract_al…
fzyzcjy 85090b8
Merge tom/refactor_retract_all into tom/release_req_free_func (chain)
fzyzcjy d45abc7
Merge tom/release_req_free_func into tom/refactor_fill_ids (chain)
fzyzcjy 8cd38d5
Merge tom/refactor_fill_ids into tom/refactor_fill_ids_b (chain)
fzyzcjy 530beae
Merge tom/refactor_fill_ids_b into tom/rm_chunked_req_scheduled_last_…
fzyzcjy 9189129
Merge tom/rm_chunked_req_scheduled_last_iter into feat/stateless_sche…
fzyzcjy e2a6556
Fix scheduler pause generation unit test
fzyzcjy a39ec1d
Skip scheduled_extend_len bounds assert for DLLM reqs
fzyzcjy 2bbf250
Migrate PP skip-output-comm validation off removed inflight_middle_ch…
fzyzcjy d0ead8a
Cover disagg-decode running reqs in abort_request
fzyzcjy 8020925
Fix chunked_req AttributeError in prefill batch build
fzyzcjy a4187d8
Drop finished prefill-only reqs in is_prefill_only branch
fzyzcjy 03706bf
Revert "Cover disagg-decode running reqs in abort_request"
fzyzcjy 4328785
Revert "Migrate PP skip-output-comm validation off removed inflight_m…
fzyzcjy 78320c1
Reapply "Migrate PP skip-output-comm validation off removed inflight_…
fzyzcjy 412417b
Track disagg-decode reqs in active_reqs
fzyzcjy 2005c85
Fix scheduled extend length for retracted decode requests
fzyzcjy 57bc204
Freeze scheduled extend target length
fzyzcjy cacbe41
Assert empty prefix in add_one_req_ignore_eos chunked branch
fzyzcjy 33f3f81
Merge tom/symmetric_ignore_eos_fill_len into tom/rm_chunked_req_sched…
fzyzcjy 6e3e20a
Merge tom/rm_chunked_req_scheduled_last_iter into feat/stateless_sche…
fzyzcjy ed2910a
Document Req.fill_len semantics
fzyzcjy a8d4676
Merge remote-tracking branch 'upstream/main' into feat/stateless_sche…
fzyzcjy c590e11
Sandbox: trigger CI for stateless_scheduler_b (do not merge)
fzyzcjy 66f2677
Fix stale SWATokenToKVPoolAllocator import after upstream merge
fzyzcjy 1720a75
Fix stale get_chunked_req call in SchedulerLoadInquirer
fzyzcjy fdfe078
Adapt scripted runtime to stateless chunked_reqs() API
fzyzcjy ecc51ef
Drop hybrid-SWA tests for removed PrefillAdder.add_chunked_req
fzyzcjy 95feb21
Key cache_unfinished_req on kv_committed_len instead of fill_len
fzyzcjy 6e89bcb
Inline get_committed_fill_ids into cache_unfinished_req call sites
fzyzcjy d5e1e39
Assert fill_len == kv_committed_len in remaining cache_unfinished_req…
fzyzcjy 21b77d6
Use None instead of 0 as Req.fill_len invalid sentinel
fzyzcjy ab6c992
Invalidate Req.fill_len to None on entering decode
fzyzcjy b879f3f
Drop verbose comment on fill_len decode invalidation
fzyzcjy 9f338d8
Rename Req.fill_len to Req.extend_fill_len
fzyzcjy 0e330b8
Avoid dual semantics of extend_input_len by computing the candidate o…
fzyzcjy 835574e
Avoid scattered assignment of extend_input_len and extend_fill_len by…
fzyzcjy 1704fb0
Inline extend_range accessors and remove the read-only properties
fzyzcjy 9ed8aca
Merge PR #27611 (inline extend_range accessors, drop extend_input_len…
fzyzcjy 77d0b0c
Remove newly-added comments and docstrings from the PR diff
fzyzcjy 917ba7e
Remove dead ScheduleBatch fields resurrected by bad merges
fzyzcjy 65e56bf
Key cache_unfinished_req on kv_committed_len instead of fill_len
fzyzcjy d0a8384
Inline get_committed_fill_ids into cache_unfinished_req call sites
fzyzcjy f435171
Use None instead of 0 as Req.fill_len invalid sentinel
fzyzcjy 6e5ee1c
Invalidate Req.fill_len to None on entering decode
fzyzcjy 14941f8
Assert fill_len == kv_committed_len in remaining cache_unfinished_req…
fzyzcjy fb47155
Drop verbose comment on fill_len decode invalidation
fzyzcjy a4e7bcb
Rename Req.fill_len to Req.extend_fill_len
fzyzcjy c17406d
Avoid dual semantics of extend_input_len by computing the candidate o…
fzyzcjy 487eb9f
Avoid scattered assignment of extend_input_len and extend_fill_len by…
fzyzcjy 3d6ddac
Inline extend_range accessors and remove the read-only properties
fzyzcjy 2c08145
Replace Req.extend_logprob_start_len field with a pure free function
fzyzcjy 20b68bc
Re-wire EAGLE chunked-prefill next-token chain onto stateless model
fzyzcjy 81989e2
Carry over the original key-variables comment and flag set_extend_ran…
fzyzcjy 880baa9
Pass the batch's extend_logprob_start_lens snapshot directly to batch…
fzyzcjy 74ca62a
Drop the newly-added explanatory comments, keeping only carried-over …
fzyzcjy 2a2d0fd
Minimize the padding hunk to the only forced change (field -> snapsho…
fzyzcjy 317b8e0
Set unused extend_logprob_start_lens to None in prebuilt path
fzyzcjy f421f19
Drop comment on extend_logprob_start_lens None assignment
fzyzcjy 4c5118c
Drop dead extend_input_logprob_token_ids boilerplate in prebuilt path
fzyzcjy 805b69b
Drop the stale leading line of the carried key-variables comment
fzyzcjy 497beb8
Derive has_pending_chunk from extend_range
fzyzcjy 6da78b1
Derive output process mode and chunked next prompt token from extend_…
fzyzcjy fd6013e
Delete scheduled_extend_len and scheduled_extend_target_len fields
fzyzcjy 3f4efd6
Update prefill adder tests for deleted scheduled_extend_len fields
fzyzcjy 887f849
Apply black formatting to prefill adder test
fzyzcjy 40c8be0
Merge tom/extend-logprob-start-len-free-fn into tom/stateless_schedul…
fzyzcjy f8ae413
Revert read_len alias in cache_unfinished_req to minimize diff
fzyzcjy a830903
Revert streaming_session chunked slice back to extend_range.end
fzyzcjy 2f52bb4
Restore forward_pass_metrics module docstring removed in error
fzyzcjy 35e48d0
Address PR review feedback on stateless-scheduler cleanup
fzyzcjy b6a0bea
Revert decode _pre_alloc extend_range start to total_prefix_len
fzyzcjy eccda4e
Restore prefix_indices comments, retargeting renamed add_non_first_ch…
fzyzcjy 7e94cfb
Minimize gratuitous diff in stateless-scheduler cleanup
fzyzcjy f4fc1d1
Merge branch 'tom/stateless_scheduler_b_diffmin' into tom/stateless_s…
fzyzcjy 9d7e85e
Note transient prefix_indices mismatch in extend_range TODO
fzyzcjy 378f5d2
Drop redundant kv_committed_len >= cache_protected_len assert in cach…
fzyzcjy 152b042
Restore dllm-stash before chunked-stash order to minimize diff
fzyzcjy f0afb38
Migrate DEBUG_INVARIANTS to SGLANG_DEBUG_REQS_INVARIANTS via environ
fzyzcjy 41b7603
Return a list from chunked_reqs instead of a lazy iterable
fzyzcjy bc3dbce
Deactivate optimistic-bootstrap reqs on failure and requeue
fzyzcjy 3dccc8f
Deactivate finished prebuilt reqs in process_batch_result_prebuilt
fzyzcjy 9981bae
Merge correctness fixes: deactivate aborted/requeued bootstrap reqs a…
fzyzcjy 74b85fb
Restore stash_chunked_request helper instead of inlining maybe_cache_…
fzyzcjy c6fad8a
Move stash_chunked_request back to its original position before _buil…
fzyzcjy 894e738
Drop underscore prefix from chunked_in_active local var
fzyzcjy ccdd942
Drop DLLM_* output-process modes in favor of is_intermediate + is_dllm
fzyzcjy 5abb0ee
Remove derived ScheduleBatch.chunked_req in favor of output_process_mode
fzyzcjy 88d429d
Clamp cache_unfinished_req cached length to the prompt boundary
fzyzcjy 4a650ae
Merge tom/cache-unfinished-req-use-committed-len (clamp cache_unfinis…
fzyzcjy 3a5dfee
Merge tom/req-fill-len-none (clamp cache_unfinished_req length)
fzyzcjy 71c53cb
Merge tom/rename-fill-len-extend-fill-len (clamp cache_unfinished_req…
fzyzcjy 20f66b3
Merge tom/extend-range-consolidate (clamp cache_unfinished_req length)
fzyzcjy 240b8c7
Merge tom/extend-candidate-on-demand (clamp cache_unfinished_req length)
fzyzcjy 1a4fc3a
Fix test doubles for the extend_range / kv_committed_len migration
fzyzcjy 8baeca8
Merge tom/extend-range-inline (clamp cache_unfinished_req length + te…
fzyzcjy 6ec36b4
Replace output_process_mode enum with is_extend_intermediate bool
fzyzcjy b2d74d8
Name full_untruncated_fill_ids locals after their getter
fzyzcjy 36785eb
Rename _decide_is_extend_intermediate to _compute_is_extend_intermediate
fzyzcjy 0464dba
Rename local is_intermediate loop var to is_extend_intermediate
fzyzcjy a33d841
Remove dead ScheduleBatch.is_hybrid_swa field missed in upstream merge
fzyzcjy df9187c
Merge upstream/main (via tom/remove-full-untruncated-fill-ids)
fzyzcjy 4ef982d
Merge upstream/main (via casc-27573)
fzyzcjy a0659c7
Merge upstream/main (via casc-27571)
fzyzcjy c3373c2
Merge upstream/main (via casc-27575)
fzyzcjy 6a50dde
Merge upstream/main (via casc-27616)
fzyzcjy b00a6ca
Merge upstream/main (via casc-27610)
fzyzcjy 27ff868
Merge upstream/main (via casc-27611)
fzyzcjy 1c08bb5
Restore DLLM request abort dropped by active_reqs migration
fzyzcjy 21655c6
Merge tom/extend-logprob-start-len-free-fn (chain + upstream/main)
fzyzcjy 839b1eb
Revert local rename to full_untruncated_fill_ids
fzyzcjy 7e71be0
Align stale padding comment with extend_range/extend_logprob_start_lens
fzyzcjy 6b568c9
Rename filter_batch only_decode_ready to skip_extend_intermediate
fzyzcjy fb33122
Apply black formatting after filter_batch kwarg rename
fzyzcjy 8c3e1cc
Reflow padding comment to match upstream #27625 wording
fzyzcjy 8f0f736
Merge tom/extend-logprob-start-len-free-fn (chain: upstream padding-c…
fzyzcjy 49bce4e
Move is_extend_intermediate merge to end of merge_batch
fzyzcjy 2a1d6d9
Assert extend_range.end > 0 in has_pending_chunk instead of guarding
fzyzcjy 4163212
Rename chunked-req scheduler state to partially_extended vocabulary
fzyzcjy e076037
Apply pre-commit formatting after partially_extended rename
fzyzcjy 691d450
Merge tom/extend-logprob-start-len-free-fn (rebuilt chain: drop None …
fzyzcjy 61d90da
Rename cache flag chunked to is_partially_extended
fzyzcjy 04655d4
Make maybe_cache_unfinished_req flag parameter explicit
fzyzcjy 8715e13
Apply pre-commit auto-fixes
fzyzcjy 9f8c2c3
Track the request lifecycle phase explicitly via Req.phase
fzyzcjy bc75ecf
Enter the extend phase at set_extend_range and base is_partially_exte…
fzyzcjy 670e2bc
Rename ReqPhase.QUEUED to OTHERS since it covers every untracked stat…
fzyzcjy 1562ea0
Cross-check Req.phase against committed KV in the busy invariant check
fzyzcjy e8230fe
Apply isort to invariant_checker import order
fzyzcjy 90b4b50
Remove resurrected v1_spec_info_filtered parameter from filter_batch
fzyzcjy 9b13478
Remove duplicated compute_extend_logprob_start_len definition
fzyzcjy 0aaf49c
Restore hybrid-SWA chunked prefill tests on the resumed-extend API
fzyzcjy e61aec5
Restore stash-gate regression tests on the stateless scheduler model
fzyzcjy 7f0e3b4
Drop stale spec_v1 reference from merge_batch comment
fzyzcjy 41c01df
Move the partially-extended batch query after prepare_for_extend
fzyzcjy 29ded7a
Drop the explanatory comment on the moved partially-extended query
fzyzcjy 1da4f3d
Move the ReqPhase.EXTEND transition from set_extend_range to prepare_…
fzyzcjy 349790c
Assert that an active req holding an extend range is in a tracked phase
fzyzcjy 6f83434
Fix stale chunked_req reference in dflash prefill delay check
fzyzcjy f7f07a9
Include partially-extended reqs in is_fully_idle batch status
fzyzcjy 1b58e9a
Always run retract path in pause_generation regardless of running_batch
fzyzcjy 6c2ac56
Reset req phase to OTHERS on bootstrap failure
fzyzcjy 11095d1
Add strict=True to is_extend_intermediate zip in eagle prefill tail t…
fzyzcjy 35202aa
Snapshot extend intermediacy at admission to kill phantom partially-e…
fzyzcjy 478b3fc
Stop resuming a chunked prefill when its bootstrap poll is deferred
fzyzcjy 224bc67
Exclude finished reqs from the running-subset-of-active debug assert
fzyzcjy 8d2b4c8
Fix hand-built reqs in test_prefill_adder and align with snapshot sem…
fzyzcjy e318072
Rename add_first_extend_req to add_unstarted_extend_req
fzyzcjy 9f56bce
Restore decode-radix re-match for retracted-resumed reqs
fzyzcjy 8a56bc0
Revert the is_extend_intermediate snapshot field for redesign
fzyzcjy e801bcc
Split ReqPhase.EXTEND into EXTEND_NON_LAST and EXTEND_LAST
fzyzcjy 00e2c3a
Set ReqPhase on hand-built reqs in test_prefill_adder
fzyzcjy 56510dd
Inline is_partially_extended and ReqPhase.is_extend into their call s…
fzyzcjy b94651d
Set req.phase at the PrefillAdder admission decision instead of prepa…
fzyzcjy 9e7d8ca
Apply black formatting to partially_extended_reqs
fzyzcjy 1154594
Update stale comment: the PrefillAdder now derives the phase
fzyzcjy 7156040
Set EXTEND_LAST on the dynamic-chunking profiler req
fzyzcjy 74f4643
Replace dead is_partially_extended mock kwarg with phase
fzyzcjy 2fceeb8
Move the DECODE phase transition before the spec early-return
fzyzcjy eaf483f
Report zero extend lengths for decode batches in run_batch
fzyzcjy 3416dea
Exclude the dLLM mask tail from cache_unfinished_req
fzyzcjy 23fb972
Set ReqPhase.DECODE in spec-v2 prepare_for_decode paths
fzyzcjy 8f8abd4
Revert "Set ReqPhase.DECODE in spec-v2 prepare_for_decode paths"
fzyzcjy 6e72294
Move the DECODE phase transition before the spec early-return
fzyzcjy 24d4c88
Drop dead extend_range/kv_committed_len None guards
fzyzcjy 7ebd8e6
Restore the kv_committed_len None guard in get_new_prebuilt_batch
fzyzcjy 0e47c4a
Revert "Exclude the dLLM mask tail from cache_unfinished_req"
fzyzcjy 33dfee2
Merge branch 'tom/stateless_scheduler_b' into tom/stateless_scheduler…
fzyzcjy 8a04683
Dummy commit to retrigger CI (reverted next commit)
fzyzcjy ad8f86c
Revert "Dummy commit to retrigger CI (reverted next commit)"
fzyzcjy b1a5d93
Migrate manual chunked-prefill tests off removed Scheduler.chunked_req
fzyzcjy fc9f42a
Apply black/isort formatting to chunked_req_of migration
fzyzcjy 4a34564
Migrate manual tests off removed Req.inflight_middle_chunks
fzyzcjy 9716bce
Apply isort to inflight_middle_chunks_of migration
fzyzcjy c7e2f1a
Replace removed disable_piecewise_cuda_graph in manual chunked tests
fzyzcjy 58f1580
Migrate manual tests off removed Req.extend_input_len
fzyzcjy 8b24faf
Apply black/isort formatting to extend_input_len_of migration
fzyzcjy 6a02e1c
Migrate manual test off removed Req.fill_ids attribute
fzyzcjy 96ab4d8
Assert decode-req path in prefill-adder intermediate test
fzyzcjy f68b908
Cover partially-extended retract branch in pause_generation test
fzyzcjy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to
new_prebuilt_batch.filter_batch()was removed. This means finished requests are not filtered out of the prebuilt batch before it is merged intoself.running_batch. Consequently, finished requests (whose KV cache has already been released) will be merged and executed again in the next forward pass, leading to redundant execution and potential CUDA/assertion crashes. Callingfilter_batch()after the assertion prevents this.