Skip to content

Conversation

@francoposa
Copy link
Contributor

@francoposa francoposa commented Dec 9, 2025

What this PR does

This PR wires the bucket index reader into the existing Label Values Offsets benchmarks.
Nothing else fancy.
Considering we are concerned about moving something from bucket to disk, we would likely want to follow up on this to add some instrumentation to record object storage calls or cache access patterns.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Introduces a bucket-backed index-header reader and shared decoding interfaces, and extends benchmarks/tests to compare bucket vs disk readers.

  • Index Header Readers:
    • New BucketBinaryReader reading symbols/postings directly from object storage; builds/loads sparse headers; caches symbols.
    • Reader Pool now instantiates BucketBinaryReader (replacing disk StreamBinaryReader).
    • Lazy Reader: only pre-downloads sparse header; no parallel ensure of full index-header on init.
  • Decoding/Encoding:
    • Added encoding.BucketDecbufFactory with buffered streaming over objstore.GetRange.
    • Generalized Decbuf to a reader interface; updated symbols/postings to accept a DecbufFactory interface.
    • Adjusted writeSparseHeadersToFile to accept proto payload; call sites updated.
  • Index Structures:
    • Symbols and PostingOffsetTable(V1/V2) refactored to use factory interface; added constructors for sparse headers.
  • Benchmarks/Tests:
    • Benchmarks updated to run both disk and bucket readers (including prefix cases) and validate parity.
    • Tests adjusted for new interfaces and helpers; minor signature changes (e.g., returning bucket in labelValues fixtures).

Written by Cursor Bugbot for commit 492c9c1. This will update automatically on new commits. Configure here.

@francoposa francoposa requested a review from a team as a code owner December 9, 2025 23:14
@francoposa francoposa added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Dec 9, 2025
@francoposa francoposa force-pushed the francoposa/indexheader-bucket-reader-scratch branch from 091e0ac to 6f61f93 Compare December 11, 2025 18:12
@francoposa francoposa force-pushed the vldmr/indexheader-bucket-reader branch from 6515349 to 7b26e7d Compare December 11, 2025 18:13
@github-actions
Copy link
Contributor

💻 Deploy preview available (Set up benchmarks to compare disk and bucket readers for index header V2):

narqo and others added 2 commits December 11, 2025 10:16
Signed-off-by: Vladimir Varankin <[email protected]>

wip! encoding: stream reader

Signed-off-by: Vladimir Varankin <[email protected]>

wip! encoding: reset reader

Signed-off-by: Vladimir Varankin <[email protected]>

rebase

Signed-off-by: Vladimir Varankin <[email protected]>

wip! reduce net buffer

Signed-off-by: Vladimir Varankin <[email protected]>

comments

Signed-off-by: Vladimir Varankin <[email protected]>

cap skippable bytes

Signed-off-by: Vladimir Varankin <[email protected]>
@francoposa francoposa force-pushed the francoposa/indexheader-bucket-reader-scratch branch from 6f61f93 to 492c9c1 Compare December 11, 2025 18:16
@francoposa francoposa changed the title Set up benchmarks to compare disk and bucket readers for index header V2 [WIP] bucket index header reader Dec 11, 2025
@francoposa francoposa changed the base branch from vldmr/indexheader-bucket-reader to main December 11, 2025 18:21
@francoposa francoposa marked this pull request as draft December 11, 2025 18:21
return NewStreamBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.streamReader, cfg)
//return NewStreamBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.streamReader, cfg)
return NewBucketBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, cfg)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Production reader factory accidentally changed to bucket reader

The readerFactory in NewBinaryReader has been changed from NewStreamBinaryReader to NewBucketBinaryReader in production code. The PR title indicates this is meant to "Set up benchmarks to compare disk and bucket readers" but this change affects all production usage of the reader pool. The commented-out NewStreamBinaryReader line suggests this was intended as a temporary testing change. The PR reviewer comment "Wrong." likely refers to this issue.

Fix in Cursor Fix in Web

//})
//if err := g.Wait(); err != nil {
// return nil, err
//}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Index header download logic commented out in production

The ensureIndexHeaderOnDisk call and its parallel execution with tryDownloadSparseHeader have been commented out, removing the pre-download of index headers during LazyBinaryReader initialization. The import for errgroup is also removed. This breaks the expected initialization behavior where both the sparse header and index header would be prepared before lazy loading occurs. For a benchmark-only PR, these production code changes appear to be accidentally committed debugging code.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-not-needed PRs that don't need a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants