Skip to content

enhance: persist segment summary metadata on SegmentInfo#50410

Open
tedxu wants to merge 1 commit into
milvus-io:masterfrom
tedxu:feat/segment-summary-metadata
Open

enhance: persist segment summary metadata on SegmentInfo#50410
tedxu wants to merge 1 commit into
milvus-io:masterfrom
tedxu:feat/segment-summary-metadata

Conversation

@tedxu

@tedxu tedxu commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

See #50406

DataCoord today derives aggregate segment metrics by iterating
SegmentInfo's FieldBinlog arrays on every scheduling decision.
For V3 segments the per-field binlog KV entries in etcd are pure
write amplification because the manifest is already the
authoritative path source.

This PR adds a Statistics message embedded on SegmentInfo so
DataCoord reads aggregates (sizes, counts, timestamp range,
null counts, quantiles) directly from a persisted field with no
branching on storage version at the read sites.

Wire formats

SaveBinlogPathsRequest.stats is populated by every flush. The
receiver consumes only stats_binlog_size, and only for V3
segments where stats live in the manifest. V2 derives the
cumulative footprint from the statslog FieldBinlog array.

CompactionSegment.stats is populated by every compactor at
write completion and copied verbatim onto the new SegmentInfo.
Insert and delta aggregates come from the FieldBinlog arrays
the compactor ships; stats blob footprint comes from a counter
the BinlogRecordWriter accumulates inside writeStats /
appendV3Stats and exposes via GetStatsBlobSize.

V3 etcd KV write skip

AlterSegments skips per-FieldBinlog KV writes for V3 segments;
the manifest is authoritative for paths and Stats carries the
aggregates DataCoord needs. Pre-existing V3 segments with binlog
KVs still load correctly via applyBinlogInfo.

Migration

Lazy. NewSegmentInfo back-fills Stats from arrays on first load
when the persisted field is nil. The back-fill lives in memory
until the next mutating operator persists the SegmentInfo proto.

Other notable changes

ShouldDoSingleCompaction reports expired_fraction one bucket
below the precise lower bound to prevent byte-size over-trigger
when binlog sizes vary.

EnsureStats is non-mutating to avoid a data race with
concurrent RLock readers; eager init happens at NewSegmentInfo
and under the meta.segMu write lock in mutating operators.

V3 L0 compaction outputs now carry TimestampFrom/To on the
resulting deltalog, matching the V1 path.

Test plan

  • Unit tests for storage.StatisticsCollector and
    BuildStatsFromFieldBinlogs
  • Unit tests for UpdateSegmentStats V2/V3 differentiation
  • Unit tests for ShouldDoSingleCompaction under-estimate
  • Unit tests for CompactionSegment.Stats V3 blob size flow
  • Unit tests for import L0 / non-L0 timestamp plumbing
  • CI integration / e2e

@sre-ci-robot sre-ci-robot requested review from czs007 and sunby June 9, 2026 08:01
@sre-ci-robot sre-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines. label Jun 9, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tedxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify Bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Jun 9, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-code-check-macos // for Code Checker MacOS (GitHub Actions)
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-buildenv // for ci-v2/build-env (build milvus-env builder images)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-e2e-amd // for ci-v2/e2e-amd (e2e pool dispatcher)
  • /ci-rerun-build-ut-cov // for ci-v2/build-ut-cov (build + unit tests in one pipeline)
  • /ci-rerun-gosdk // for ci-v2/go-sdk (Go SDK E2E tests, ARM)

If you have any questions or requests, please contact @zhikunyao.

@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results 6bce5d6

Stage Result Duration Tests
✅ Build SUCCESS 13.9min -
❌ Code-Check FAILURE 2.1min -
✅ UT-CPP-Cov SUCCESS 51.6min 7837 passed

Total: 68min | Pipeline | Artifacts

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.75%. Comparing base (385caab) to head (3982aea).
⚠️ Report is 11 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (385caab) and HEAD (3982aea). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (385caab) HEAD (3982aea)
3 1
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #50410       +/-   ##
===========================================
- Coverage   78.96%   44.75%   -34.22%     
===========================================
  Files        2239       12     -2227     
  Lines      396917     2098   -394819     
===========================================
- Hits       313441      939   -312502     
+ Misses      73900     1107    -72793     
+ Partials     9576       52     -9524     
Components Coverage Δ
Client ∅ <ø> (∅)
Core ∅ <ø> (∅)
Go ∅ <ø> (∅)
see 2227 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tedxu tedxu force-pushed the feat/segment-summary-metadata branch from 6bce5d6 to a2f100b Compare June 9, 2026 10:57
@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Jun 9, 2026
@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results a2f100b

Stage Result Duration Tests
✅ Build SUCCESS 14.1min -
❌ Code-Check FAILURE 7.0min -
✅ UT-CPP-Cov SUCCESS 52.7min 7837 passed

Total: 70min | Pipeline | Artifacts

@sre-ci-robot sre-ci-robot removed the low-code-coverage add test-label from zhikun, diff coverage > 80% label Jun 9, 2026
Add a Statistics message embedded on SegmentInfo so DataCoord
reads aggregate metrics (sizes, counts, timestamp range, null
counts, quantiles) from a persisted field instead of iterating
FieldBinlog arrays on every scheduling decision.

Wire formats:

SaveBinlogPathsRequest.stats is populated by every flush. The
receiver consumes only stats_binlog_size, and only for V3
segments where stats live in the manifest; V2 derives the
cumulative footprint from the statslog FieldBinlog array.

CompactionSegment.stats is populated by every compactor at write
completion and copied verbatim onto the new SegmentInfo. Insert
and delta aggregates come from the FieldBinlog arrays the
compactor ships; stats blob footprint comes from a counter the
BinlogRecordWriter accumulates inside writeStats /
appendV3Stats and exposes via GetStatsBlobSize().

For V3 segments, AlterSegments skips per-FieldBinlog KV writes.
The manifest is the authoritative path source and Stats carries
the aggregates DataCoord needs. Pre-existing V3 segments with
binlog KVs still load correctly via applyBinlogInfo.

Migration is lazy: NewSegmentInfo back-fills Stats from arrays
on first load when the persisted field is nil. The back-fill
lives in memory until the next mutating operator persists the
SegmentInfo proto.

ShouldDoSingleCompaction reports expired_fraction one bucket
below the precise lower bound to prevent byte-size over-trigger
when binlog sizes vary.

EnsureStats is non-mutating to avoid a data race with concurrent
RLock readers; eager init happens at NewSegmentInfo and under
the meta.segMu write lock in mutating operators.

V3 L0 compaction outputs now carry TimestampFrom/To on the
resulting deltalog, matching the V1 path.

issue: milvus-io#50406
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
@tedxu tedxu force-pushed the feat/segment-summary-metadata branch from a2f100b to 3982aea Compare June 10, 2026 02:23
@sre-ci-robot

Copy link
Copy Markdown
Contributor

❌ CI Loop Results 3982aea

Stage Result Duration Tests
✅ Build SUCCESS 9.5min -
❌ Code-Check FAILURE 4.0min -
✅ UT-CPP-Cov SUCCESS 35.5min 7837 passed

Total: 48min | Pipeline | Artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants