lite#974
Conversation
| token_processed=0, | ||
| ) | ||
|
|
||
| self.total_block_nums += req_blocks_num |
There was a problem hiding this comment.
self.total_block_nums is only incremented when a new request is added to requests_meta. If a request already exists (e.g., after a chunked prefill), this counter won't be updated. This could lead to inconsistent statistics. Consider either incrementing for all requests or clarifying the intended behavior.
| return 0, False | ||
|
|
||
| def update_state_after_alloc( | ||
| self, request: "Request", blocks: "KVCacheBlocks", num_external_tokens: int |
There was a problem hiding this comment.
💡 Suggestion: The stub methods (update_state_after_alloc, start_load_kv, wait_for_layer_load, save_kv_layer, wait_for_save) should have docstrings explaining that they are intentionally empty for the lite connector, or raise NotImplementedError if they are not meant to be called.
| def wait_for_save(self): | ||
| pass | ||
|
|
||
| def generate_hash( |
There was a problem hiding this comment.
generate_hash method uses self.request_hasher which could raise exceptions. There's no error handling here. If the hasher fails, the exception will propagate up and could crash the request processing. Consider adding a try-except block or documenting the expected behavior.
Purpose
Modifications
Test