Skip to content

fix/issue 656 default block size factor#1

Closed
miroslavln wants to merge 22 commits into
mainfrom
fix/issue-656-default-block-size-factor
Closed

fix/issue 656 default block size factor#1
miroslavln wants to merge 22 commits into
mainfrom
fix/issue-656-default-block-size-factor

Conversation

@miroslavln

Copy link
Copy Markdown
Owner

kfirtoledo and others added 22 commits May 27, 2026 11:56
…lm-d#613)

The test stubs vllm.v1.kv_offload.base to load the real manager
module in isolation. Commit 8cf550e added get_offload_block_hash
to manager.py's imports, but the stub wasn't updated, so collection
of tests/test_storage_events.py failed with "cannot import name
'get_offload_block_hash' from 'vllm.v1.kv_offload.base' (unknown
location)". Add an identity stub so the import resolves and the
existing assertions (which pass plain ints as keys) still hold.

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
* pvc_evictor: walk current FileMapper layout in crawler

After llm-d#585 collapsed the FileMapper layout to
`<root>/<safe_model_name>_<sha256-12>_r<rank>/<hhh>/<hh>_g<group_idx>/*.bin`,
the crawler still walked the pre-llm-d#585 deep tree
(`block_size_*/tp_*_pp_size_*/rank_*/<dtype>/...`). Those paths no
longer exist on disk, so the crawler discovered zero files and the
evictor never freed any blocks.

This rewrites `stream_cache_files_with_mapper` to:

* find rank directories by their `_r<digits>` suffix anywhere under
  the cache root, instead of pattern-matching the deprecated deep tree;
* iterate the first-level hex bucket ({hhh}) and apply the existing
  hex_modulo_range sharding;
* yield .bin files from any second-level bucket underneath, leaving
  the `_g<group_idx>` encoding opaque so the walker doesn't need to
  understand kv-cache groups.

The new walker no longer instantiates `FileMapper` (it only inspects
on-disk names), so the `FILEMAPPER_AVAILABLE` import guard and the
matching early-exit in `crawler_process` are removed. This also
unblocks running the evictor container without vllm installed, since
the walker was the only consumer of the `from llmd_fs_backend.file_mapper
import FileMapper` import — which transitively required vllm.

`parse_filemapper_params` is dropped along with its sole caller.

Signed-off-by: Miro <mironikolov@google.com>

* feat: parameterize first-level hex bucket directory length in pvc crawler

Signed-off-by: Miro <mironikolov@google.com>

---------

Signed-off-by: Miro <mironikolov@google.com>
Signed-off-by: Guy Girmonsky <guygir@gmail.com>
* parse hma kv event metadata

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

Co-authored-by: Kapil Jain <16477749+kapiljain1989@users.noreply.github.com>

* reduce noise

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Kapil Jain <16477749+kapiljain1989@users.noreply.github.com>
fix CUDA version mismatch and dev headers symlink
- Update default CUDA_TOOLKIT_PKG to cuda-toolkit-13-0 to
  match the CUDA 13.0 base image and prevent PyTorch compilation
  version mismatch.
- Explicitly parse and update the standard /usr/local/cuda symlink
  after GKE package installation to resolve missing dev headers
  (cusparse.h) during compilation

Signed-off-by: Saikat Roychowdhury <saikat.royc85@gmail.com>
* feat: Emit BlockRemoved events in PVC evictor

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* test: Add deleter and BlockRemoved tests

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* chore: Refactor after FileMapper changes

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* chore: Minor changes

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* chore: Fix lint issues

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* fix: Use only valid paths for events

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* fix: Emit events on shutdown

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
* ci: Wire fs_backend Python tests into CI

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* chore: Clean up branch

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* fix: Use venv for llmd_fs_backend test Python deps

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

* chore: Support PVC evictor events

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

---------

Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…pdates (llm-d#630)

Bumps the go-dependencies group with 10 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [github.com/alicebob/miniredis/v2](https://github.com/alicebob/miniredis) | `2.35.0` | `2.38.0` |
| [github.com/dgraph-io/ristretto/v2](https://github.com/dgraph-io/ristretto) | `2.3.0` | `2.4.0` |
| [github.com/docker/docker](https://github.com/docker/docker) | `28.5.1+incompatible` | `28.5.2+incompatible` |
| [github.com/fxamacker/cbor/v2](https://github.com/fxamacker/cbor) | `2.7.0` | `2.9.2` |
| [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) | `1.22.0` | `1.23.2` |
| [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) | `9.7.3` | `9.20.0` |
| [github.com/testcontainers/testcontainers-go](https://github.com/testcontainers/testcontainers-go) | `0.40.0` | `0.42.0` |
| [go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.63.0` | `0.69.0` |
| [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) | `1.39.0` | `1.44.0` |
| [go.uber.org/zap](https://github.com/uber-go/zap) | `1.27.0` | `1.28.0` |



Updates `github.com/alicebob/miniredis/v2` from 2.35.0 to 2.38.0
- [Release notes](https://github.com/alicebob/miniredis/releases)
- [Changelog](https://github.com/alicebob/miniredis/blob/master/CHANGELOG.md)
- [Commits](alicebob/miniredis@v2.35.0...v2.38.0)

Updates `github.com/dgraph-io/ristretto/v2` from 2.3.0 to 2.4.0
- [Release notes](https://github.com/dgraph-io/ristretto/releases)
- [Changelog](https://github.com/dgraph-io/ristretto/blob/main/CHANGELOG.md)
- [Commits](dgraph-io/ristretto@v2.3.0...v2.4.0)

Updates `github.com/docker/docker` from 28.5.1+incompatible to 28.5.2+incompatible
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v28.5.1...v28.5.2)

Updates `github.com/fxamacker/cbor/v2` from 2.7.0 to 2.9.2
- [Release notes](https://github.com/fxamacker/cbor/releases)
- [Commits](fxamacker/cbor@v2.7.0...v2.9.2)

Updates `github.com/prometheus/client_golang` from 1.22.0 to 1.23.2
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.22.0...v1.23.2)

Updates `github.com/prometheus/client_model` from 0.6.1 to 0.6.2
- [Release notes](https://github.com/prometheus/client_model/releases)
- [Commits](prometheus/client_model@v0.6.1...v0.6.2)

Updates `github.com/redis/go-redis/v9` from 9.7.3 to 9.20.0
- [Release notes](https://github.com/redis/go-redis/releases)
- [Changelog](https://github.com/redis/go-redis/blob/master/RELEASE-NOTES.md)
- [Commits](redis/go-redis@v9.7.3...v9.20.0)

Updates `github.com/testcontainers/testcontainers-go` from 0.40.0 to 0.42.0
- [Release notes](https://github.com/testcontainers/testcontainers-go/releases)
- [Commits](testcontainers/testcontainers-go@v0.40.0...v0.42.0)

Updates `go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc` from 0.63.0 to 0.69.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.63.0...zpages/v0.69.0)

Updates `go.opentelemetry.io/otel` from 1.43.0 to 1.44.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0)

Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.39.0 to 1.44.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.39.0...v1.44.0)

Updates `go.opentelemetry.io/otel/sdk` from 1.43.0 to 1.44.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0)

Updates `go.opentelemetry.io/otel/trace` from 1.43.0 to 1.44.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0)

Updates `go.uber.org/zap` from 1.27.0 to 1.28.0
- [Release notes](https://github.com/uber-go/zap/releases)
- [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md)
- [Commits](uber-go/zap@v1.27.0...v1.28.0)

Updates `google.golang.org/grpc` from 1.77.0 to 1.81.1
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.77.0...v1.81.1)

Updates `google.golang.org/protobuf` from 1.36.10 to 1.36.11

---
updated-dependencies:
- dependency-name: github.com/alicebob/miniredis/v2
  dependency-version: 2.38.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/dgraph-io/ristretto/v2
  dependency-version: 2.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/docker/docker
  dependency-version: 28.5.2+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/fxamacker/cbor/v2
  dependency-version: 2.9.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/prometheus/client_golang
  dependency-version: 1.23.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/prometheus/client_model
  dependency-version: 0.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: github.com/redis/go-redis/v9
  dependency-version: 9.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: github.com/testcontainers/testcontainers-go
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
  dependency-version: 0.69.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.opentelemetry.io/otel/trace
  dependency-version: 1.44.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: go.uber.org/zap
  dependency-version: 1.28.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: google.golang.org/grpc
  dependency-version: 1.81.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
- dependency-name: google.golang.org/protobuf
  dependency-version: 1.36.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…lm-d#589)

The 16 MB floor on staging buffers caused every per-file write to pad
the buffer via cudaHostAlloc, then write the full padded size to disk.
For small-block models (e.g. Llama 3.1 8B where block size is ~7 MB)
this more than doubles the on-disk footprint (7 MB -> 16 MB per file)
and degrades TTFT from 2.8 s to 6.5 s at 40k-token prefix length.

Remove the constant so allocate_staging_buffer uses exactly the size
that calc_staging_bytes computes.

Fixes llm-d#454

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
…-d#634)

clang-format (v21) flags the double blank line left after llm-d#589 removed
MIN_STAGING_BUFFER_SIZE. Fixes the repo-wide pre-commit lint gate that
currently fails on all PRs.

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
The vllm 0.22 KV-offload Python API (base, worker, OffloadPromMetrics) is
unchanged from 0.21, so no import/signature changes are needed.

The one functional change is in spec.py: adopt the OffloadingSpec base
class's $self.hash_block_size$ (resolved via resolve_kv_cache_block_sizes)
instead of $vllm_config.cache_config.block_size$ when computing
gpu_blocks_per_file. The two are equal for standard single-group models,
but cache_config.block_size can be larger on hybrid models, so the base
value is the correct hash granularity. Single-group only (no HMA).

Bump pins: vllm 0.21.0 -> 0.22.0, package version 0.21 -> 0.22. README,
Dockerfile.dev base image, and vllm-storage.yaml deployment example
updated accordingly.

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
…lm-d#607)

* csrc: batch KV block copies via cudaMemcpyBatchAsync

Submit all per-(block, layer) copies in one driver call instead of N
cudaMemcpyAsync calls. Enabled by default; toggle off with
USE_BATCH_MEMCPY_READ / USE_BATCH_MEMCPY_WRITE=0. Requires CUDA 12.8+.

Speeds up KV-cache offload writes/reads when per-layer DMA sizes are
small enough that driver dispatch dominates.

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>

* csrc: fall back to per-call DMA on CUDA < 12.8

cudaMemcpyBatchAsync was introduced in CUDA 12.8 — guard the batch
path with #if CUDA_VERSION >= 12080 and route to the per-call
cudaMemcpyAsync loop below that. Default USE_BATCH_MEMCPY_* off on
older toolchains so the env knob still makes sense.

Also drop thread_local on the attrs/attrs_idx inputs (never mutated,
no per-thread duplication needed) and move the copy_blocks dispatcher
below the helpers it dispatches to.

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>

---------

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
…lm-d#632)

llmd-fs-connector==0.22 (llm-d v0.8 / vLLM v0.22) is the final release of
the standalone llm-d FS connector. The filesystem offloading logic is now
upstreamed into vLLM as the FS tier of the multi-tier offloading connector
(TieringOffloadingSpec); all new features and support continue there.

Add an [!IMPORTANT] banner to the connector README and a short note in the
root README's Connectors & Utilities list, linking the vLLM KV offloading
guide (vllm-project/vllm#44415).

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
Signed-off-by: Dong Ma <winterma.dong@gmail.com>
Add a `latest` input (default `false`) to the shared docker-build-and-push
action and wire it up from `ci-release.yaml` and
`ci-release-uds-tokenizer.yaml`. The callers set it to `true` only when the
triggering event is a non-prerelease GitHub Release.

Previously the release workflows only pushed the immutable `vX.Y.Z` tag (and
`vllm-v*` for the UDS tokenizer), so the floating `:latest` tag on ghcr.io
was never refreshed after the initial manual push. That left
`ghcr.io/llm-d/llm-d-uds-tokenizer:latest` pointing at a 29-day-old build
while `v0.8.0` had been published 9 days earlier, causing version skew with
sibling components whose `latest` tag stayed current.

Dev / PR / pre-release / workflow_dispatch builds intentionally keep the
default (`false`) so the floating tag is never bumped by a non-release
artifact.

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
…lm-d#619)

* feat(evictor): background empty directory cleanup

Empty cache directories accumulate as files are evicted. This adds a
background folder-cleaner process (P(N+3), gated by ENABLE_DIR_CLEANUP)
that removes them.

How it works:
- The crawler detects empty rank/{hhh}/{hh} directories during its sweep
  and the deleter offers each freshly-emptied parent directory after a
  batch delete. Both feed a shared folder_queue.
- The folder cleaner pulls paths off the queue and removes them with
  os.rmdir, which is inherently safe: it is a no-op if a file has landed
  in the directory in the meantime.

Safety:
- queue_folder skips directories modified within DIR_CLEANUP_TTL_SECONDS
  (default 120s) so we don't race a writer that just created a bucket and
  is about to populate it. This is defense-in-depth on top of rmdir's
  empty-only semantics.

Config / Helm:
- New ENABLE_DIR_CLEANUP (default true) and DIR_CLEANUP_TTL_SECONDS
  (default 120) env vars, wired through config.py, the Helm values and
  Deployment template, and documented in CONFIGURATION.md.

Reporting:
- The folder cleaner reports folders_purged via a new folder_cleaner_stats
  channel surfaced in the aggregated log. The crawler's per-sweep counter
  is named empty_folders_queued to reflect that it counts directories
  handed to the cleaner, not directories it deleted itself. The deleter's
  progress/done result-queue protocol is left unchanged.

Signed-off-by: Miro <mironikolov@google.com>

* Fix pvc_evictor unit tests due to delete_file_batch signature update

Signed-off-by: Miro <mironikolov@google.com>

* test: merge empty directory cleanup unit tests from PR llm-d#625

Signed-off-by: Miro <mironikolov@google.com>

* fix(test): use moby container package instead of docker docker package to fix testcontainers-go build error

Signed-off-by: Miro <mironikolov@google.com>

* fix(ci): update golangci-lint configuration format

Signed-off-by: Miro <mironikolov@google.com>

* fix(kvblock): resolve ZRevRange deprecation and lll warnings

Signed-off-by: Miro <mironikolov@google.com>

* fix(test): resolve lll and unused gocritic lint warnings in uds_e2e_suite_test.go

Signed-off-by: Miro <mironikolov@google.com>

* Fix the linting errors

Signed-off-by: Miro <mironikolov@google.com>

---------

Signed-off-by: Miro <mironikolov@google.com>
* add hma group identity to kvblock entries

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* learn hma groups from kv events

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* expose hma group catalog

Expose the learned group catalog so scorer follow-up work can use the event-derived metadata.

Co-authored-by: Kapil Jain <kapiljain1989@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* fix redis pod entry encoding

Handle JSON encoding errors and store PodEntry directly for runtime Redis index state.

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* test namings

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* use only redis field

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* encode decode func namings for redis

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* remove redundant TestPodEntryString after redis keys change

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* remove noise from git diff

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

---------

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Kapil Jain <kapiljain1989@gmail.com>
feat: Add HMA support to the fs connector

---------

Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
…m-d#618)

* test(pvc_evictor): add crawler tests, CI, and docs after llm-d#611
Add pytest for stream_cache_files_with_mapper, pvc-evictor CI workflow,
dev Makefile/requirements, Dockerfile comments for llm-d#605 storage events,
and docs for the flat fs_backend layout.
Keeps llmd_fs_backend in the image for upcoming llm-d#605; crawler stays
path-only per llm-d#611. Follow-up to llm-d#601 / llm-d#611.

Signed-off-by: Guy Girmonsky <guygir@gmail.com>

* fix(pvc_evictor): ruff import order in test conftest

Signed-off-by: Guy Girmonsky <guygir@gmail.com>

---------

Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Signed-off-by: Alex <alex.tech.lab@outlook.com>
When block_size is absent from kv_connector_extra_config, the fs
backend defaults offloaded_block_size to 256 tokens, but vLLM's
OffloadingSpec base class only derives block_size_factor when
block_size is explicitly present, leaving it at 1. The scheduler then
emits one offload key per GPU block while the worker consumes one key
per file (gpu_blocks_per_file blocks), so on hybrid models (e.g. Gemma
with sliding-window + full-attention KV cache groups) the second
group's key slice lands inside the first group's keys and every
transfer fails with:

    AssertionError: Expected group_idx=1 but key encodes 0

Set block_size_factor = gpu_blocks_per_file in
SharedStorageOffloadingSpec so the scheduler and worker always agree on
the per-file key granularity, regardless of whether block_size is
configured explicitly. On single-group models the old mismatch did not
assert but silently named files after the wrong block hashes, crippling
the offload hit rate.

Fixes llm-d#656

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Miro <mironikolov@google.com>
vLLM's OffloadingSpec base class derives block_size_factor from
extra_config["block_size"] and asserts that all KV cache groups share
one GPU block size to do so. Hybrid models like Gemma 4 have groups
with different block sizes, so explicitly configuring "block_size"
crashed at startup with:

    AssertionError: If 'block_size' is specified in
    kv_connector_extra_config, there must be at least one KV cache
    group, and all groups must have the same block size.

The fs backend does not need that uniformity: it sizes files in
hash_block_size (GCD of group block sizes) granularity and already
derives block_size_factor itself. Hide "block_size" from the base
class during super().__init__() (restoring it afterwards) so the
uniformity assert is never reached and explicit and default block_size
configurations behave identically.

Fixes llm-d#657

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Miro <mironikolov@google.com>
@github-actions

Copy link
Copy Markdown

Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

10 participants