Skip to content

Conversation

@kmrmt
Copy link
Contributor

@kmrmt kmrmt commented Nov 19, 2025

Description

  • use local instance of errgroup.Group
  • use context.Context from testing.B
  • remove WaitGroup

Related Issue

Versions

  • Vald Version: v1.7.17
  • Go Version: v1.25.3
  • Rust Version: v1.90.0
  • Docker Version: v28.5.1
  • Kubernetes Version: v1.34.1
  • Helm Version: v3.19.0
  • NGT Version: v2.5.0
  • Faiss Version: v1.12.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

  • Refactor
    • Centralized context acquisition from benchmark runtime, simplified Run entrypoints, and migrated concurrency to errgroup-based coordination across benchmark components.
  • Bug Fixes
    • Improved streaming EOF/status handling, error propagation, and synchronization for more reliable benchmark send/receive flows.
  • Breaking Changes
    • Benchmark operation signatures and starter Run no longer accept an explicit context parameter; call sites updated accordingly.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

📝 Walkthrough

Walkthrough

Removed explicit context parameters from benchmark Operation and Starter Run signatures; call sites now obtain context via testing.B/TB (b.Context()/tb.Context()). Replaced manual sync.WaitGroup goroutine coordination with errgroup-based concurrency and adjusted streaming EOF/status/error handling.

Changes

Cohort / File(s) Summary
Benchmark operations (interface & implementations)
hack/benchmark/internal/operation/operation.go, hack/benchmark/internal/operation/insert.go, hack/benchmark/internal/operation/remove.go, hack/benchmark/internal/operation/search.go
Removed context.Context from Operation interface and method signatures; methods now call b.Context()/b.Context() internally; streaming paths refactored to use errgroup (eg.Go / eg.Wait) alongside existing sync.WaitGroup where needed; EOF/status-based handling and error propagation adjusted; imports updated.
Starter Run implementations
hack/benchmark/internal/starter/starter.go, hack/benchmark/internal/starter/gateway/vald/vald.go, hack/benchmark/internal/starter/agent/core/ngt/ngt.go
Dropped context.Context parameter from Run methods; Run now derives context from tb.Context() and coordinates internal goroutines using errgroup instead of manual sync.WaitGroup.
E2E benchmark call-sites
hack/benchmark/e2e/agent/core/ngt/ngt_bench_test.go
Updated benchmark tests to call Run(tb)() and operation methods without context arguments; removed explicit context creation/imports and adjusted call sites to match new signatures.
Minor / formatting
internal/core/algorithm/usearch/usearch.go
Minor whitespace/comment formatting change only (no functional change).

Sequence Diagram(s)

sequenceDiagram
  participant TB as testing.TB / testing.B
  participant Runner as Starter.Run(tb)
  participant Eg as errgroup
  participant Worker as goroutine (worker)

  TB->>Runner: Run(tb) -> stop()
  Runner->>Eg: eg, ctx := errgroup.New(tb.Context())
  Note right of Eg: spawn workers via eg.Go(...)
  Eg->>Worker: eg.Go(worker)
  Worker-->>Eg: return error / nil
  Runner->>Eg: eg.Wait() on stop()
  Eg-->>Runner: aggregated error / nil
Loading
sequenceDiagram
  participant B as testing.B
  participant Streamer as StreamInsert/StreamSearch
  participant Eg as errgroup
  participant Sender as sender goroutine (wg)
  participant Receiver as recv goroutine (eg.Go)

  B->>Streamer: StreamX(b, ...)
  Streamer->>Eg: eg, ctx := errgroup.New(b.Context())
  Streamer->>Sender: start send loop (wg)
  Streamer->>Receiver: eg.Go(recv loop)
  Receiver-->>Eg: io.EOF -> return nil
  Receiver-->>Eg: other error -> return err
  Streamer->>Eg: eg.Wait() & wg.Wait()
  Eg-->>Streamer: propagate errors
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review focus:
    • Ensure context is consistently sourced from b.Context()/tb.Context() and correctly threaded to downstream calls.
    • Verify errgroup replaces WaitGroup without introducing goroutine leaks or race conditions and that waiting semantics (eg.Wait vs wg.Wait) are correct.
    • Validate streaming recv/send error handling (EOF -> nil, status-based AlreadyExists logic, error propagation).
    • Confirm updated public signatures compile across the repository and that tests/e2e call-sites were updated accordingly.
    • Pay special attention to hack/benchmark/internal/starter/agent/core/ngt/ngt.go and changes around lifecycle management.

Suggested labels

size/XL, area/internal, type/ci

Suggested reviewers

  • kpango
  • Matts966
  • datelier

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title ':bug: fix grpc benchmark' is directly related to the main changes in this PR, which refactor benchmark code to use errgroup, obtain context from testing.B, and remove WaitGroup.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/benchmark/grpc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vdaas-ci
Copy link
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
hack/benchmark/internal/starter/agent/core/ngt/ngt.go (1)

49-93: errgroup migration in Run looks correct; context layering could be simplified

The switch to a local errgroup.New(ctx) and joining via eg.Go / eg.Wait() preserves the previous WaitGroup semantics and looks correct for this starter. Both goroutines always return nil, so ignoring eg.Wait()’s error is harmless with the current code.

If you want to simplify the context chain slightly, you could consider creating the cancelable context first and then passing it into errgroup.New, so there’s only a single derived context instead of ctx (arg) -> errgroup ctx -> WithCancel ctx. Not required, but it might make the flow a bit easier to follow.

hack/benchmark/internal/operation/search.go (1)

133-191: Redundant sc != nil checks in StreamSearchByID loop conditions

In StreamSearchByID, sc is created once and only checked, never reassigned to nil, so:

  • The recv loop condition for sc != nil { ... } and
  • The sender loop guard && sc != nil

are effectively always true given the current code. Termination is already controlled via EOF (setting finished to true) and the benchmark’s context.

You could simplify this by dropping the sc != nil checks (e.g., for { ... } and for i := 0; i < b.N && !finished.Load(); i++), which would make the control flow a bit clearer without changing behavior.

hack/benchmark/internal/operation/insert.go (1)

66-135: Consider logging eg.Wait() errors in StreamInsert

In StreamInsert, the recv goroutine now returns a non-nil error on non-EOF failures:

if err != nil {
    // ...
    return err
}

This is good for triggering errgroup’s context cancellation and shutting down the stream early. However, the subsequent eg.Wait() call:

eg.Wait()

ignores that error, so any Recv error that doesn’t also show up on Send / CloseSend will be invisible at the benchmark level.

If you want better visibility into such failures, you could optionally log the result of eg.Wait():

-        eg.Wait()
+        if err := eg.Wait(); err != nil {
+            grpcError(b, err)
+        }

This keeps the early-cancel behavior while making hidden Recv-side errors easier to diagnose.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6840f88 and b70feb9.

📒 Files selected for processing (4)
  • hack/benchmark/internal/operation/insert.go (5 hunks)
  • hack/benchmark/internal/operation/remove.go (5 hunks)
  • hack/benchmark/internal/operation/search.go (5 hunks)
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go (4 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-10-08T20:16:00.642Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-05-07T04:15:11.886Z
Learning: In this project, vtprotobuf is used, which provides enhanced methods for protobuf objects: CloneVT (object cloning), EqualVT (object comparison), SizeVT (returns object size), MarshalVT (faster version of proto.Marshal), and UnmarshalVT (faster version of proto.Unmarshal). These methods offer better performance than usual protobuf usage.
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-06-10T19:25:49.832Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-05-07T04:15:11.886Z
Learning: In this project, vtprotobuf is used, which provides enhanced methods for protobuf objects: CloneVT (object cloning), EqualVT (object comparison), SizeVT (returns object size), MarshalVT (faster version of proto.Marshal), and UnmarshalVT (faster version of proto.Unmarshal). These methods offer better performance than usual protobuf usage.
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 addressed gRPC connection panic issues by removing the singleflight.Group[pool.Conn] field from the gRPCClient struct in internal/net/grpc/client.go and introducing a centralized handleError function for consistent error processing in both Connect and Disconnect methods.
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 addressed gRPC connection panic issues by removing the singleflight.Group[pool.Conn] field from the gRPCClient struct in internal/net/grpc/client.go and introducing a centralized handleError function for more consistent error processing in both Connect and Disconnect methods.
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 removed the singleflight.Group field from the gRPCClient struct in internal/net/grpc/client.go and introduced a centralized handleError function for error processing to address gRPC connection panic issues.
Learnt from: Matts966
Repo: vdaas/vald PR: 2928
File: pkg/gateway/lb/handler/grpc/linear_search.go:271-333
Timestamp: 2025-04-15T05:32:26.428Z
Learning: For the vald project, refactoring PRs should be kept separate from semantic changes that alter behavior. The current PR "Divide LB Gateway handler.go" is focused solely on splitting the file into smaller, more manageable components without changing functionality.
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/aggregation.go:500-505
Timestamp: 2024-06-10T19:25:49.832Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/aggregation.go:500-505
Timestamp: 2024-05-08T02:17:42.078Z
Learning: The `GetNum()` method in the `valdStdAggr` struct should include a nil check to prevent runtime panics when called on a nil instance. This is a defensive programming practice to enhance the robustness of the code.
📚 Learning: 2025-05-01T08:44:07.726Z
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 addressed gRPC connection panic issues by removing the singleflight.Group[pool.Conn] field from the gRPCClient struct in internal/net/grpc/client.go and introducing a centralized handleError function for more consistent error processing in both Connect and Disconnect methods.

Applied to files:

  • hack/benchmark/internal/operation/insert.go
  • hack/benchmark/internal/operation/remove.go
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go
  • hack/benchmark/internal/operation/search.go
📚 Learning: 2025-05-01T08:44:07.726Z
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 addressed gRPC connection panic issues by removing the singleflight.Group[pool.Conn] field from the gRPCClient struct in internal/net/grpc/client.go and introducing a centralized handleError function for consistent error processing in both Connect and Disconnect methods.

Applied to files:

  • hack/benchmark/internal/operation/insert.go
  • hack/benchmark/internal/operation/remove.go
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go
  • hack/benchmark/internal/operation/search.go
📚 Learning: 2025-05-01T08:44:07.726Z
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 removed the singleflight.Group field from the gRPCClient struct in internal/net/grpc/client.go and introduced a centralized handleError function for error processing to address gRPC connection panic issues.

Applied to files:

  • hack/benchmark/internal/operation/insert.go
  • hack/benchmark/internal/operation/remove.go
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go
  • hack/benchmark/internal/operation/search.go
📚 Learning: 2024-06-10T19:25:49.832Z
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-06-10T19:25:49.832Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-05-07T04:15:11.886Z
Learning: In this project, vtprotobuf is used, which provides enhanced methods for protobuf objects: CloneVT (object cloning), EqualVT (object comparison), SizeVT (returns object size), MarshalVT (faster version of proto.Marshal), and UnmarshalVT (faster version of proto.Unmarshal). These methods offer better performance than usual protobuf usage.

Applied to files:

  • hack/benchmark/internal/operation/remove.go
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go
  • hack/benchmark/internal/operation/search.go
🧬 Code graph analysis (4)
hack/benchmark/internal/operation/insert.go (1)
internal/sync/errgroup/group.go (1)
  • Wait (253-258)
hack/benchmark/internal/operation/remove.go (2)
internal/io/io.go (1)
  • EOF (40-40)
internal/sync/errgroup/group.go (1)
  • Wait (253-258)
hack/benchmark/internal/starter/agent/core/ngt/ngt.go (1)
internal/sync/errgroup/group.go (1)
  • Wait (253-258)
hack/benchmark/internal/operation/search.go (1)
internal/sync/errgroup/group.go (1)
  • Wait (253-258)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build / build (amd64, ubuntu-latest)
  • GitHub Check: grpc-stream
  • GitHub Check: grpc-sequential
  • GitHub Check: build / build (amd64, ubuntu-latest)
  • GitHub Check: CodeQL Analyze (go)
  • GitHub Check: CodeQL Analyze (actions)
  • GitHub Check: CodeQL Analyze (rust)
  • GitHub Check: runner / go build
  • GitHub Check: check-format-diff
  • GitHub Check: runner / LanguageTool / apis & charts
  • GitHub Check: runner / textlint
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (2)
hack/benchmark/internal/operation/search.go (1)

55-110: StreamSearch errgroup refactor is consistent and preserves behavior

Using ctx := b.Context() plus eg, ctx := errgroup.New(ctx) and moving the recv loop into eg.Go while waiting with eg.Wait() keeps the previous behavior (logging via grpcError and looping until EOF) and just swaps out the WaitGroup for errgroup. Since the goroutine always returns nil, ignoring eg.Wait() here is fine.

hack/benchmark/internal/operation/remove.go (1)

51-107: StreamRemove errgroup conversion looks good

The migration from a local WaitGroup to errgroup.New(ctx) in StreamRemove is straightforward:

  • ctx := b.Context() ensures the stream is tied to the benchmark lifecycle.
  • The recv loop correctly exits on io.EOF and keeps logging other errors via grpcError.
  • eg.Wait() is only joining a goroutine that always returns nil, so ignoring its result matches the previous WaitGroup semantics.

No issues spotted here.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 19, 2025
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 19, 2025

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 35b1c27
Status: ✅  Deploy successful!
Preview URL: https://7d22c0bc.vald.pages.dev
Branch Preview URL: https://bugfix-benchmark-grpc.vald.pages.dev

View logs

@kmrmt kmrmt requested review from a team, Matts966 and kpango and removed request for a team November 19, 2025 05:26
Matts966
Matts966 previously approved these changes Nov 19, 2025
@kmrmt kmrmt dismissed stale reviews from Matts966 and coderabbitai[bot] via 768404a November 20, 2025 10:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
hack/benchmark/internal/starter/agent/core/ngt/ngt.go (1)

78-87: Mixing local eg with global errgroup.Go prevents eg.Wait() from waiting on the event loop

Here you start the ech listener with the package-level errgroup.Go, but the cleanup waits on the local eg:

eg, ctx := errgroup.New(ctx)
...
errgroup.Go(func() error { ... })
...
eg.Wait()

Because errgroup.Go attaches the goroutine to the default group (see internal/sync/errgroup/group.go), eg.Wait() will never wait for this listener, and any error it returns will be tracked on the global group instead of the local one. That both undermines the “use a local instance of errgroup.Group” goal and can make listener failures invisible to this starter’s shutdown path.

Recommend wiring the listener into the same local group:

-   errgroup.Go(func() error {
+   eg.Go(func() error {
        for {
            select {
            case <-ctx.Done():
                return nil
            case err := <-ech:
                tb.Error(err)
            }
        }
    })

This way the cleanup eg.Wait() waits for both the runner and the listener, and no global group state is involved.

Also applies to: 89-93

hack/benchmark/internal/operation/remove.go (1)

52-113: Stream receive loop can spin indefinitely on non‑EOF errors

In StreamRemove, the receive goroutine does:

for {
    res, err := sc.Recv()
    if err == io.EOF {
        return nil
    }
    if err != nil {
        grpcError(b, err)
        continue
    }
    ...
}

For gRPC streams, any non‑io.EOF error from Recv is terminal; subsequent Recv calls will typically keep returning the same error. With the current continue, this goroutine can spin forever on that error, and wg.Wait() will then block indefinitely, hanging the benchmark.

Since grpcError is also used in non-stream paths, it’s unlikely to be calling FailNow from this goroutine, so this isn’t self‑terminating.

Suggest exiting the goroutine on non‑EOF errors and propagating them via errgroup:

-           res, err := sc.Recv()
-           if err == io.EOF {
-               return nil
-           }
-
-           if err != nil {
-               grpcError(b, err)
-               continue
-           }
+           res, err := sc.Recv()
+           if err == io.EOF {
+               return nil
+           }
+           if err != nil {
+               grpcError(b, err)
+               return err
+           }

You may also want to ensure that whatever calls these benchmarks invokes errgroup.Wait() so the propagated error is surfaced.

hack/benchmark/internal/operation/search.go (2)

56-116: StreamSearch receive loop should not continue after terminal errors

StreamSearch’s receive goroutine currently mirrors the pattern in StreamRemove:

for {
    res, err := sc.Recv()
    if err == io.EOF {
        return nil
    }
    if err != nil {
        grpcError(b, err)
        continue
    }
    ...
}

Because gRPC Recv errors (other than io.EOF) are terminal, this will typically loop forever logging the same error and prevent wg.Wait() from returning if such an error occurs.

Recommend exiting the goroutine on non‑EOF errors:

-        for {
-            res, err := sc.Recv()
-            if err == io.EOF {
-                return nil
-            }
-            if err != nil {
-                grpcError(b, err)
-                continue
-            }
+        for {
+            res, err := sc.Recv()
+            if err == io.EOF {
+                return nil
+            }
+            if err != nil {
+                grpcError(b, err)
+                return err
+            }
             if res.GetResponse() == nil {
                 b.Error("returned response is nil")
             }
         }

Same concern applies to the other streaming function below (StreamSearchByID).


118-204: SearchByID looks fine; StreamSearchByID has the same Recv error‑loop risk

  • The non‑stream SearchByID change to derive ctx := b.Context() is straightforward and consistent with the rest of the API; logic otherwise unchanged.

  • In StreamSearchByID, the receive goroutine has:

for sc != nil {
    res, err := sc.Recv()
    if err == io.EOF {
        finished.Store(true)
        return nil
    }
    if err != nil {
        grpcError(b, err)
        continue
    }
    ...
}

As with StreamSearch, a non‑EOF error from Recv is terminal. With the current continue, the goroutine can spin indefinitely on the same error, never setting finished to true and never letting wg.Wait() complete.

Consider exiting on non‑EOF errors and propagating them via errgroup:

-        for sc != nil {
-            res, err := sc.Recv()
-            if err == io.EOF {
-                finished.Store(true)
-                return nil
-            }
-            if err != nil {
-                grpcError(b, err)
-                continue
-            }
+        for sc != nil {
+            res, err := sc.Recv()
+            if err == io.EOF {
+                finished.Store(true)
+                return nil
+            }
+            if err != nil {
+                grpcError(b, err)
+                return err
+            }
             if res.GetResponse() == nil {
                 b.Error("returned response is nil")
             }
         }

You might also consider whether sc != nil in the loop condition is still needed since sc is never mutated in this function.

🧹 Nitpick comments (2)
hack/benchmark/internal/operation/insert.go (1)

67-137: StreamInsert mixes errgroup.Go with a local sync.WaitGroup; consider simplifying

The Recv side is started with errgroup.Go but you never call errgroup.Wait() here, and you also wrap it in a sync.WaitGroup that is what actually gates return:

wg := sync.WaitGroup{}
wg.Add(1)
...
errgroup.Go(func() error {
    defer wg.Done()
    for {
        res, err := sc.Recv()
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return err
        }
        ...
        atomic.AddInt64(&insertedNum, 1)
    }
})
...
wg.Wait()

As written, errgroup doesn’t add much over a plain go statement, and any non-EOF error from Recv is only surfaced through the global errgroup (if anything calls errgroup.Wait elsewhere).

Two simpler options:

  • If you want everything local to this benchmark: introduce a local group and drop sync.WaitGroup:
-   wg := sync.WaitGroup{}
-   wg.Add(1)
+   eg, ctx := errgroup.New(ctx)
    b.ResetTimer()

-   errgroup.Go(func() error {
-       defer wg.Done()
+   eg.Go(func() error {
        for {
            res, err := sc.Recv()
            if err == io.EOF {
                return nil
            }
            if err != nil {
-               return err
+               return err // causes eg.Wait() to fail
            }
            ...
        }
    })
    ...
-   wg.Wait()
+   if err := eg.Wait(); err != nil {
+       b.Fatal(err)
+   }
  • Or, if you don’t need errgroup here at all, replace errgroup.Go with a plain go and keep the sync.WaitGroup.

Either way, you avoid relying on global errgroup state and make error handling for the Recv loop explicit and local to this function.

hack/benchmark/internal/operation/operation.go (1)

51-67: CreateIndex behavior change (PoolSize and context) looks reasonable; confirm intended PoolSize

CreateIndex now:

  • Uses ctx := b.Context(), which correctly ties index creation to the benchmark lifecycle.
  • Sets PoolSize: 16 instead of the previous larger value, which will significantly change benchmark behavior (less concurrent indexing work).

If the goal of this PR is strictly to fix gRPC benchmark behavior, it may be worth double‑checking that the PoolSize reduction is an intentional tuning change and not accidental.

Otherwise, the error‑handling around codes.FailedPrecondition and the use of b.ResetTimer() and b.Run("CreateIndex", ...) remain consistent and look fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b70feb9 and 768404a.

📒 Files selected for processing (8)
  • hack/benchmark/e2e/agent/core/ngt/ngt_bench_test.go (2 hunks)
  • hack/benchmark/internal/operation/insert.go (5 hunks)
  • hack/benchmark/internal/operation/operation.go (2 hunks)
  • hack/benchmark/internal/operation/remove.go (4 hunks)
  • hack/benchmark/internal/operation/search.go (6 hunks)
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go (3 hunks)
  • hack/benchmark/internal/starter/gateway/vald/vald.go (1 hunks)
  • hack/benchmark/internal/starter/starter.go (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-06-10T19:25:49.832Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-05-07T04:15:11.886Z
Learning: In this project, vtprotobuf is used, which provides enhanced methods for protobuf objects: CloneVT (object cloning), EqualVT (object comparison), SizeVT (returns object size), MarshalVT (faster version of proto.Marshal), and UnmarshalVT (faster version of proto.Unmarshal). These methods offer better performance than usual protobuf usage.
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-10-08T20:16:00.642Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-05-07T04:15:11.886Z
Learning: In this project, vtprotobuf is used, which provides enhanced methods for protobuf objects: CloneVT (object cloning), EqualVT (object comparison), SizeVT (returns object size), MarshalVT (faster version of proto.Marshal), and UnmarshalVT (faster version of proto.Unmarshal). These methods offer better performance than usual protobuf usage.
Learnt from: Matts966
Repo: vdaas/vald PR: 2928
File: pkg/gateway/lb/handler/grpc/linear_search.go:271-333
Timestamp: 2025-04-15T05:32:26.428Z
Learning: For the vald project, refactoring PRs should be kept separate from semantic changes that alter behavior. The current PR "Divide LB Gateway handler.go" is focused solely on splitting the file into smaller, more manageable components without changing functionality.
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 addressed gRPC connection panic issues by removing the singleflight.Group[pool.Conn] field from the gRPCClient struct in internal/net/grpc/client.go and introducing a centralized handleError function for consistent error processing in both Connect and Disconnect methods.
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 addressed gRPC connection panic issues by removing the singleflight.Group[pool.Conn] field from the gRPCClient struct in internal/net/grpc/client.go and introducing a centralized handleError function for more consistent error processing in both Connect and Disconnect methods.
📚 Learning: 2024-06-10T19:25:49.832Z
Learnt from: kpango
Repo: vdaas/vald PR: 2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-06-10T19:25:49.832Z
Learning: User: kpango
PR: vdaas/vald#2491
File: pkg/gateway/lb/handler/grpc/handler.go:306-316
Timestamp: 2024-05-07T04:15:11.886Z
Learning: In this project, vtprotobuf is used, which provides enhanced methods for protobuf objects: CloneVT (object cloning), EqualVT (object comparison), SizeVT (returns object size), MarshalVT (faster version of proto.Marshal), and UnmarshalVT (faster version of proto.Unmarshal). These methods offer better performance than usual protobuf usage.

Applied to files:

  • hack/benchmark/internal/starter/gateway/vald/vald.go
  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go
  • hack/benchmark/internal/operation/operation.go
📚 Learning: 2025-04-15T05:32:26.428Z
Learnt from: Matts966
Repo: vdaas/vald PR: 2928
File: pkg/gateway/lb/handler/grpc/linear_search.go:271-333
Timestamp: 2025-04-15T05:32:26.428Z
Learning: For the vald project, refactoring PRs should be kept separate from semantic changes that alter behavior. The current PR "Divide LB Gateway handler.go" is focused solely on splitting the file into smaller, more manageable components without changing functionality.

Applied to files:

  • hack/benchmark/internal/starter/gateway/vald/vald.go
📚 Learning: 2025-06-04T06:36:35.397Z
Learnt from: Matts966
Repo: vdaas/vald PR: 3000
File: tests/v2/e2e/crud/kubernetes_test.go:138-147
Timestamp: 2025-06-04T06:36:35.397Z
Learning: In E2E tests for the Vald project, Matts966 prefers using dual error logging (both fmt.Fprintf to stderr and t.Errorf) when handling Kubernetes operations to provide immediate debugging feedback during test execution alongside structured test failure reporting.

Applied to files:

  • hack/benchmark/internal/starter/gateway/vald/vald.go
  • hack/benchmark/internal/operation/operation.go
📚 Learning: 2025-05-01T08:44:07.726Z
Learnt from: kpango
Repo: vdaas/vald PR: 0
File: :0-0
Timestamp: 2025-05-01T08:44:07.726Z
Learning: PR #2927 removed the singleflight.Group field from the gRPCClient struct in internal/net/grpc/client.go and introduced a centralized handleError function for error processing to address gRPC connection panic issues.

Applied to files:

  • hack/benchmark/internal/starter/agent/core/ngt/ngt.go
🧬 Code graph analysis (4)
hack/benchmark/internal/operation/insert.go (2)
hack/benchmark/internal/assets/dataset.go (1)
  • Dataset (30-43)
internal/io/io.go (1)
  • EOF (40-40)
hack/benchmark/internal/starter/agent/core/ngt/ngt.go (1)
internal/sync/errgroup/group.go (1)
  • Wait (247-249)
hack/benchmark/internal/operation/remove.go (2)
internal/runner/runner.go (1)
  • Run (141-242)
internal/io/io.go (1)
  • EOF (40-40)
hack/benchmark/internal/operation/search.go (2)
hack/benchmark/internal/assets/dataset.go (1)
  • Dataset (30-43)
internal/sync/errgroup/group.go (1)
  • Wait (247-249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (8)
hack/benchmark/internal/starter/agent/core/ngt/ngt.go (1)

49-63: Run(tb testing.TB) refactor looks consistent with new starter API

Deriving ctx from tb.Context() and using a local errgroup.Group around runner.Run plus a cancelable context in the returned cleanup func matches the new Starter.Run(tb testing.TB) func() contract and should scope the benchmark lifecycle cleanly.

Also applies to: 89-93

hack/benchmark/internal/operation/insert.go (1)

30-65: Insert benchmark change aligns with new no-ctx API

Switching to Insert(b *testing.B, ds assets.Dataset) and deriving ctx := b.Context() internally keeps callers simple while still propagating benchmark cancellation into the client. The insert loop and AlreadyExists handling remain consistent with the previous behavior.

hack/benchmark/internal/starter/gateway/vald/vald.go (1)

37-43: Gateway starter Run signature matches updated Starter interface

Adapting (*server).Run to Run(tb testing.TB) func() keeps this gateway starter consistent with the new starter.Starter API. Given it’s currently a stub, this change is safe and future-proof.

hack/benchmark/internal/starter/starter.go (1)

24-26: Starter interface update is coherent with callers and implementers

Changing Starter.Run to Run(testing.TB) func() matches the updated agent and gateway starters and the benchmarks that now call Run(b)(). This keeps the starter abstraction consistent and removes the need for an external context parameter.

hack/benchmark/e2e/agent/core/ngt/ngt_bench_test.go (1)

67-73: Benchmark updates correctly follow new Run and operation APIs

The benchmarks now:

  • Start the NGT agent with defer ngt.New(...).Run(b)(); and
  • Call the no-context operation methods (Insert, CreateIndex, Search*, Remove*) that derive ctx from b.Context() internally.

This wiring is consistent with the starter and operation refactors and keeps the benchmark code straightforward.

Also applies to: 79-83, 107-113, 119-123

hack/benchmark/internal/operation/remove.go (1)

26-49: Context source change in Remove looks consistent

Using ctx := b.Context() and dropping the explicit context.Context parameter keeps the benchmark tied to the testing.B lifecycle and matches the interface changes in operation.go. The rest of the logic is unchanged and looks fine.

hack/benchmark/internal/operation/search.go (1)

28-54: Non‑stream Search correctly switched to b.Context()

Dropping the explicit context.Context parameter and using ctx := b.Context() inside the benchmark aligns with the updated Operation interface and keeps RPCs bound to the benchmark’s lifecycle. The rest of the logic is unchanged and looks correct.

hack/benchmark/internal/operation/operation.go (1)

26-36: Operation interface change is correctly applied throughout; verification confirms no legacy signatures remain

All implementations of the Operation interface in hack/benchmark/internal/operation/search.go have been properly updated to match the new signatures (e.g., Search(b *testing.B, ds assets.Dataset) without context.Context). The implementations correctly derive context via b.Context() internally when needed. Commented-out test code in search_test.go references the old signature but is not active. No mismatches or remaining old signatures found in the codebase.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 20, 2025
@github-actions github-actions bot added size/L and removed size/M labels Nov 20, 2025
coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 22, 2025
@codecov
Copy link

codecov bot commented Nov 23, 2025

Codecov Report

❌ Patch coverage is 0% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.11%. Comparing base (70204cb) to head (35b1c27).

Files with missing lines Patch % Lines
hack/benchmark/internal/operation/search.go 0.00% 9 Missing ⚠️
hack/benchmark/internal/operation/insert.go 0.00% 7 Missing ⚠️
hack/benchmark/internal/operation/remove.go 0.00% 6 Missing ⚠️
...k/benchmark/internal/starter/agent/core/ngt/ngt.go 0.00% 5 Missing ⚠️
hack/benchmark/internal/operation/operation.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3342      +/-   ##
==========================================
- Coverage   18.12%   18.11%   -0.01%     
==========================================
  Files         126      126              
  Lines       12038    12042       +4     
==========================================
  Hits         2182     2182              
- Misses       9583     9587       +4     
  Partials      273      273              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants