[WIP] bucket index header reader #13797

francoposa · 2025-12-09T23:14:32Z

What this PR does

This PR wires the bucket index reader into the existing Label Values Offsets benchmarks.
Nothing else fancy.
Considering we are concerned about moving something from bucket to disk, we would likely want to follow up on this to add some instrumentation to record object storage calls or cache access patterns.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
about-versioning.md updated with experimental features.

Note

Introduces a bucket-backed index-header reader and shared decoding interfaces, and extends benchmarks/tests to compare bucket vs disk readers.

Index Header Readers:
- New BucketBinaryReader reading symbols/postings directly from object storage; builds/loads sparse headers; caches symbols.
- Reader Pool now instantiates BucketBinaryReader (replacing disk StreamBinaryReader).
- Lazy Reader: only pre-downloads sparse header; no parallel ensure of full index-header on init.
Decoding/Encoding:
- Added encoding.BucketDecbufFactory with buffered streaming over objstore.GetRange.
- Generalized Decbuf to a reader interface; updated symbols/postings to accept a DecbufFactory interface.
- Adjusted writeSparseHeadersToFile to accept proto payload; call sites updated.
Index Structures:
- Symbols and PostingOffsetTable(V1/V2) refactored to use factory interface; added constructors for sparse headers.
Benchmarks/Tests:
- Benchmarks updated to run both disk and bucket readers (including prefix cases) and validate parity.
- Tests adjusted for new interfaces and helpers; minor signature changes (e.g., returning bucket in labelValues fixtures).

^{Written by Cursor Bugbot for commit 492c9c1. This will update automatically on new commits. Configure here.}

pkg/storage/indexheader/reader_benchmarks_test.go

github-actions · 2025-12-11T18:14:56Z

💻 Deploy preview available (Set up benchmarks to compare disk and bucket readers for index header V2):

Signed-off-by: Vladimir Varankin <[email protected]> wip! encoding: stream reader Signed-off-by: Vladimir Varankin <[email protected]> wip! encoding: reset reader Signed-off-by: Vladimir Varankin <[email protected]> rebase Signed-off-by: Vladimir Varankin <[email protected]> wip! reduce net buffer Signed-off-by: Vladimir Varankin <[email protected]> comments Signed-off-by: Vladimir Varankin <[email protected]> cap skippable bytes Signed-off-by: Vladimir Varankin <[email protected]>

cursor · 2025-12-11T18:29:16Z

pkg/storage/indexheader/reader_pool.go

-		return NewStreamBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.streamReader, cfg)
+		//return NewStreamBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, p.metrics.streamReader, cfg)
+		return NewBucketBinaryReader(ctx, logger, bkt, dir, id, postingOffsetsInMemSampling, cfg)
 	}


Bug: Production reader factory accidentally changed to bucket reader

The readerFactory in NewBinaryReader has been changed from NewStreamBinaryReader to NewBucketBinaryReader in production code. The PR title indicates this is meant to "Set up benchmarks to compare disk and bucket readers" but this change affects all production usage of the reader pool. The commented-out NewStreamBinaryReader line suggests this was intended as a temporary testing change. The PR reviewer comment "Wrong." likely refers to this issue.

cursor · 2025-12-11T18:29:16Z

pkg/storage/indexheader/lazy_binary_reader.go

+	//})
+	//if err := g.Wait(); err != nil {
+	//	return nil, err
+	//}


Bug: Index header download logic commented out in production

The ensureIndexHeaderOnDisk call and its parallel execution with tryDownloadSparseHeader have been commented out, removing the pre-download of index headers during LazyBinaryReader initialization. The import for errgroup is also removed. This breaks the expected initialization behavior where both the sparse header and index header would be prepared before lazy loading occurs. For a benchmark-only PR, these production code changes appear to be accidentally committed debugging code.

francoposa requested a review from a team as a code owner December 9, 2025 23:14

cursor bot reviewed Dec 9, 2025

View reviewed changes

pkg/storage/indexheader/reader_benchmarks_test.go Show resolved Hide resolved

francoposa added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Dec 9, 2025

francoposa force-pushed the francoposa/indexheader-bucket-reader-scratch branch from 091e0ac to 6f61f93 Compare December 11, 2025 18:12

francoposa requested review from a team, bubu11e, stevesg, tacole02, vaxvms and wilfriedroset as code owners December 11, 2025 18:12

francoposa force-pushed the vldmr/indexheader-bucket-reader branch from 6515349 to 7b26e7d Compare December 11, 2025 18:13

narqo and others added 2 commits December 11, 2025 10:16

Set up benchmarks to compare disk and bucket readers for index header V2

492c9c1

francoposa force-pushed the francoposa/indexheader-bucket-reader-scratch branch from 6f61f93 to 492c9c1 Compare December 11, 2025 18:16

francoposa changed the title ~~Set up benchmarks to compare disk and bucket readers for index header V2~~ [WIP] bucket index header reader Dec 11, 2025

francoposa changed the base branch from vldmr/indexheader-bucket-reader to main December 11, 2025 18:21

francoposa removed request for a team, bubu11e, stevesg, tacole02, vaxvms and wilfriedroset December 11, 2025 18:21

francoposa marked this pull request as draft December 11, 2025 18:21

cursor bot reviewed Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] bucket index header reader #13797

[WIP] bucket index header reader #13797

francoposa commented Dec 9, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

cursor bot Dec 11, 2025

Uh oh!

cursor bot Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] bucket index header reader #13797

Are you sure you want to change the base?

[WIP] bucket index header reader #13797

Conversation

francoposa commented Dec 9, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

cursor bot Dec 11, 2025

Choose a reason for hiding this comment

Bug: Production reader factory accidentally changed to bucket reader

Uh oh!

cursor bot Dec 11, 2025

Choose a reason for hiding this comment

Bug: Index header download logic commented out in production

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

francoposa commented Dec 9, 2025 •

edited by cursor bot

Loading