Skip to content

[Bug]: Concurrent DML returns schema mismatch when adding a nullable StructArray field #50409

@zhuwenxing

Description

@zhuwenxing

Is there an existing issue for this?

I have searched the existing issues.

Environment

  • Milvus version: master, commit bc95b2ded6
  • Deployment mode(standalone or cluster): standalone
  • MQ type(rocksmq, pulsar or kafka): woodpecker
  • SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 3.1.0rc13
  • OS(Ubuntu or CentOS): Linux container in Kubernetes
  • CPU/Memory: standalone pod 2 CPU / 8Gi
  • GPU: N/A
  • Others:

K8s Pod List

jz-master-schema-retry-etcd-0                             1/1   Running   0   15m   10.104.24.144   4am-node29
jz-master-schema-retry-milvus-standalone-55b46b9b-4fpg5   1/1   Running   0   14m   10.104.24.145   4am-node29
jz-master-schema-retry-minio-5f8dcfccff-p65z6             1/1   Running   0   15m   10.104.24.143   4am-node29

Current Behavior

When a nullable StructArray field is dynamically added while insert/upsert requests using the old schema are running concurrently, Milvus returns collection schema mismatch.

The current Python test can pass only because it has a test-side retry helper for this error. Server logs still show that the insert/upsert requests hit schema version mismatch and are returned as client-visible schema mismatch errors.

This is similar to older add-field schema mismatch issues, but this repro specifically adds a nullable StructArray field, whose schema evolution path may differ from ordinary scalar/vector fields.

Expected Behavior

Adding a nullable StructArray field should not expose transient schema mismatch errors to users.

Since the new StructArray field is nullable, insert/upsert requests using the old schema and omitting the newly added field should be accepted or transparently retried by the server after schema propagation.

Steps To Reproduce

  1. Deploy latest master Milvus with Woodpecker.
  2. Create a collection with id, vector field, and scalar tag field.
  3. Insert old sealed rows, flush, create vector index, and load the collection.
  4. Insert old growing rows.
  5. Run these operations concurrently:
    • dynamically add nullable StructArray field profile
    • insert old-schema rows that omit profile
    • upsert old-schema rows that omit profile
    • delete some old rows
  6. Observe server logs and/or remove the test-side retry to see client-side collection schema mismatch.

Python test:

pytest -q \
  tests/python_client/milvus_client/test_milvus_client_struct_array_nullable.py::TestMilvusClientStructArraySchemaEvolution::test_add_scalar_struct_array_field_concurrent_dml_query_search

Milvus Log

[WARN] [shard/shard_interceptor.go:158] ["insertMessage schema version mismatch"]
[collectionID=466875623245088337] [schemaVersionProvided=true]
[schemaVersion=0] [collectionSchemaVersion=1]
[error="collection schema version not match"]

[WARN] [metricsutil/wal_write.go:98] ["append message into wal failed"]
[message="{type=Insert,...,rows=3}"]
[error="code: STREAMING_CODE_SCHEMA_VERSION_MISMATCH, cause: schema version mismatch, input schema version: 0, collection schema version: 1"]

[WARN] [proxy/task_insert_streaming.go:79] ["append messages to wal failed"]
[error="code: STREAMING_CODE_SCHEMA_VERSION_MISMATCH, cause: schema version mismatch, input schema version: 0, collection schema version: 1"]

[WARN] [proxy/task_upsert_streaming.go:44] ["append messages to wal failed"]
[collectionName=struct_array_add_scalar_struct_concurrent_dml_OOk3UmCl]
[error="code: STREAMING_CODE_SCHEMA_VERSION_MISMATCH, cause: schema version mismatch, input schema version: 0, collection schema version: 1"]

[WARN] [proxy/task_scheduler.go:593] ["Failed to execute task: "]
[error="collection schema mismatch"]

[WARN] [proxy/impl.go:3068] ["Failed to execute insert task in task scheduler"]
[collection=struct_array_add_scalar_struct_concurrent_dml_OOk3UmCl]
[NumRows=5] [partialUpdate=false] [error="collection schema mismatch"]

Anything else?

Related historical issues:

This issue is filed separately because the reproduced case dynamically adds a nullable StructArray field, and latest master still exposes STREAMING_CODE_SCHEMA_VERSION_MISMATCH / collection schema mismatch during concurrent DML.

Metadata

Metadata

Labels

feature/struct arraykind/bugIssues or changes related a bugneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions