Skip to content

[Bug]: Go SDK CI flaky failure in TestCreateIndexVanillaFaissGeneric returns fewer than topK results #50392

@yanliang567

Description

@yanliang567

Environment

Reproduction

Option B: Steps

  1. Run ci-v2/go-sdk / milvus-go-sdk-pipeline with:
    • branch=master
    • PR_NUMBER=50380
    • COMMIT_SHA=9a78aec54c941a229810cf972ea9ca0ecc20c22c
    • ciMode=e2e-arm
    • milvus_deployment_option=standalone
    • gotestsum_cmd=gotestsum --format testname --hide-summary=output -- -v ./testcases/... -timeout=120m
  2. The flaky test is tests/go_client/testcases/index_test.go:975:
    • Create collection with Int64Vec, default nb=3000, dim=128
    • Insert and flush data
    • Create generic index on floatVec with:
      • index_type=FAISS
      • metric_type=L2
      • faiss_index_name=IVF64,Flat
    • Load collection
    • Search with topK=10, nq=5, nprobe=8, strong consistency
  3. The test intermittently gets fewer than topK results for at least one query.

Trigger Conditions

  • Frequency: intermittent. Build 5691 failed, but nearby Go SDK CI runs passed the same test.
  • Observed failure build: milvus-go-sdk-pipeline/5691
  • Nearby passing builds for the same test: 5688, 5695, 5697, 5698, 5700, 5680, 5679
  • Nearby unrelated failures: 5689 and 5686 failed in other test cases while TestCreateIndexVanillaFaissGeneric passed.
  • Likely unrelated to PR fix: Restrict Kafka chaos selectors to broker pods #50380: that PR only changes Kafka chaos selector YAML files under tests/python_client/chaos/chaos_objects/..., not Go SDK/index/search code.

Expected Behavior

TestCreateIndexVanillaFaissGeneric should consistently return topK=10 results for every query after 3000 rows are inserted, flushed, indexed with FAISS / IVF64,Flat, and loaded.

Actual Behavior

In Jenkins build 5691, the test failed because one search result set had only 7 results instead of 10:

=== FAIL: testcases TestCreateIndexVanillaFaissGeneric (7.04s)
Error Trace: tests/go_client/common/response_checker.go:241
             tests/go_client/testcases/index_test.go:1013
Error:       Not equal:
             expected: 7
             actual  : 10
Test:        TestCreateIndexVanillaFaissGeneric
Messages:    Expected topK=10, actual ResultCount=7
DONE 918 tests, 25 skipped, 1 failure in 1706.586s

Note: the expected / actual labels above are reversed by the assertion call order in CheckSearchResult; the message is the clearer signal: expected topK=10, actual ResultCount=7.

Error Logs

Relevant Jenkins console excerpts:

[2026-06-08T18:37:39.806Z] === RUN   TestCreateIndexVanillaFaissGeneric
[2026-06-08T18:37:39.806Z] === PAUSE TestCreateIndexVanillaFaissGeneric
[2026-06-08T18:37:39.806Z] === CONT  TestCreateIndexVanillaFaissGeneric
[2026-06-08T18:37:32.592Z] [Request] [method=CreateCollection] [collection_name=TestCreateIndexVanillaFaissGeneric_DZXjVw]
[2026-06-08T18:37:33.174Z] [Request] [method=Flush] [collection_names=[TestCreateIndexVanillaFaissGeneric_DZXjVw]]
[2026-06-08T18:37:33.364Z] [Response] [method=Flush] [status={}]
[2026-06-08T18:37:39.807Z] Error Trace: tests/go_client/common/response_checker.go:241
[2026-06-08T18:37:39.807Z]              tests/go_client/testcases/index_test.go:1013
[2026-06-08T18:37:39.807Z] Messages: Expected topK=10, actual ResultCount=7
[2026-06-08T18:37:39.807Z] --- FAIL: TestCreateIndexVanillaFaissGeneric (7.04s)

The job did not surface a client-side RPC error; the failure is a silent wrong/short search result count.

Non-default Configuration

ciMode: e2e-arm
milvus_deployment_option: standalone
milvus_helm_version: 5.0.6
build_option: reuse-arm-if-exists
image_repository: harbor-us-vdc.zilliz.cc/milvusdb/milvus

Analysis Hints

  • Suspect area: FAISS generic index search path, index/load readiness, or search result reduction when using FAISS + IVF64,Flat + nprobe=8 on ARM Go SDK CI.
  • Test location: tests/go_client/testcases/index_test.go:975
  • Assertion location: tests/go_client/common/response_checker.go:241
  • The PR where this was observed does not modify Go SDK/index/search code, so this should be tracked as a flaky CI/test stability issue rather than a PR regression.
  • Secondary CI issue: the post stage printed Test exit code file not found, assuming test passed, so Milvus server logs were not archived even though the test exited with code 1. That made server-side diagnosis harder for this failure.

Metadata

Metadata

Assignees

Labels

area/testkind/bugIssues or changes related a bugtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions