Skip to content

feat: Add PVC evictor BlockRemoved events#605

Open
albertoperdomo2 wants to merge 7 commits into
llm-d:mainfrom
albertoperdomo2:feat/pvc-evictor-events
Open

feat: Add PVC evictor BlockRemoved events#605
albertoperdomo2 wants to merge 7 commits into
llm-d:mainfrom
albertoperdomo2:feat/pvc-evictor-events

Conversation

@albertoperdomo2
Copy link
Copy Markdown
Contributor

@albertoperdomo2 albertoperdomo2 commented May 25, 2026

Summary

This PR adds BlockRemoved event emission from the PVC evictor's deleter process, complementing the BlockStored events added in #571. When the evictor deletes KV cache files from shared storage, it now publishes BlockRemoved events so downstream consumers (e.g. the Go storage indexer) can remove stale checkpoints.

The implementation extends StorageEventPublisher with a publish_blocks_removed() method that supports per-call model_name overrides, since the evictor handles files from multiple models on the same PVC. The model_name constructor parameter is optional to just support this multi-model use case.

Block hashes are extracted by reversing the FileMapper filename convention (16-char hex basename), and the original model name is recovered from config.json at the FileMapper base directory. Model name lookups are cached per base directory so only one filesystem read occurs per model per process lifetime. Events are grouped by model and published after each successful batch deletion.

Testing

All existing tests pass. This PR adds the new deleter tests and extends storage event tests.

Validated with:

pytest kv_connectors/pvc_evictor/tests/test_deleter.py -q
pytest kv_connectors/llmd_fs_backend/tests/test_storage_events.py -q

Related Issues

Note

Test layout depends on wether #578 is merged before this one or not.

@github-actions github-actions Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 25, 2026
@albertoperdomo2
Copy link
Copy Markdown
Contributor Author

cc: @kfirtoledo

@kfirtoledo kfirtoledo requested a review from guygir May 26, 2026 05:22
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
@albertoperdomo2 albertoperdomo2 force-pushed the feat/pvc-evictor-events branch from af821f3 to d80e670 Compare May 28, 2026 08:29
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Copy link
Copy Markdown
Collaborator

@guygir guygir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertoperdomo2 , overall looks good - just a few small things that need addressing.

batch_start_time = time.time()
deleted, freed = delete_batch(batch, dry_run, logger)

if deleted > 0 and event_publisher is not None and cache_path is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dry_run=True, delete_batch() still returns len(file_paths) as deleted, so this block can publish BlockRemoved even though no files were removed. could we skip event publishing when dry_run is true (and/or only publish after a real delete)?

Copy link
Copy Markdown
Contributor Author

@albertoperdomo2 albertoperdomo2 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should publish BlockRemoved events when dry_run=True. That can cause the Go indexer to drop valid checkpoints (in the upcoming storage indexer), which is data corruption from the indexer's perspective. The event consumer shouldn't need to know about dry runs.

Comment thread kv_connectors/pvc_evictor/processes/deleter.py Outdated
Comment thread kv_connectors/pvc_evictor/processes/deleter.py
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
@albertoperdomo2 albertoperdomo2 requested a review from guygir May 29, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants