Skip to content

fix(sampling): align resize-factor behavior and replace reservoir panics with errors#120

Open
Fengzdadi wants to merge 2 commits intoapache:mainfrom
Fengzdadi:feat-varopt-items-sketch
Open

fix(sampling): align resize-factor behavior and replace reservoir panics with errors#120
Fengzdadi wants to merge 2 commits intoapache:mainfrom
Fengzdadi:feat-varopt-items-sketch

Conversation

@Fengzdadi
Copy link
Contributor

  1. Align ResizeFactor handling with Java semantics (lg-based growth) in ReservoirItemsSketch and VarOptItemsSketch.
  2. Align ReservoirItemsSketch.EstimateSubsetSum with Java by returning stream-level estimates and TotalSketchWeight = N.
  3. Replace reservoir overflow panic paths with returned error in sketch/union update flows.
  4. Update tests accordingly; go test ./... passes.

minLgArrItems = 4
)

func resizeFactorLg(rf ResizeFactor) (int, error) {
Copy link
Member

@proost proost Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define ResizeFactor using iota. so, verbose case syntax doesn't need.


ceilingLgK, _ := internal.ExactLog2(common.CeilingPowerOf2(k))
initialLgSize := startingSubMultiple(
ceilingLgK, int(math.Log2(float64(options.resizeFactor))), minLgArrItems,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using ResizeFactor itself is ok , right?

if targetCap <= cap(s.data) {
return nil
}
newData := make([]T, len(s.data), targetCap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L162 ~ L164 is too expensive. that is why i use Grow


// forceIncrementItemsSeen adds delta to the items seen count.
func (s *ReservoirItemsSketch[T]) forceIncrementItemsSeen(delta int64) {
func (s *ReservoirItemsSketch[T]) forceIncrementItemsSeen(delta int64) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add delta to s.n first, it already change state. so guarding not to above maxItemsSeen is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was wrong. chaning state first is correct. sketch's state is gone wrong already.

assert.Equal(t, ResizeX8, sketch.rf)
}

func TestReservoirItemsSketchUpdateReturnsErrorAtMaxItemsSeen(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case is included already defined test function as nest case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments