Skip to content

docs:add local format design doc#22

Open
aoiasd wants to merge 1 commit intomilvus-io:mainfrom
aoiasd:main
Open

docs:add local format design doc#22
aoiasd wants to merge 1 commit intomilvus-io:mainfrom
aoiasd:main

Conversation

@aoiasd
Copy link
Contributor

@aoiasd aoiasd commented Mar 5, 2026

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aoiasd
To complete the pull request process, please assign tedxu after the PR has been reviewed.
You can assign the PR to them by writing /assign @tedxu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


The **local_format** feature addresses this by:

- Allowing users to specify `local_format=vortex` per field at collection creation time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should definitely not be bound to collection schema?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to change loading local format through milvus config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fields can use different formats to adapt to various situations.


### 2. Column Group Splitting

Fields with `local_format=vortex` are split into a separate column group from default (row) format fields. This is handled by `LocalFormatPolicy` in the column group splitting pipeline:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean splited in s3 or locally on QN? I assume local format should not affect remote formate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splited in S3. When we write binlog. Means field 'localformat=vortex' should not in same column group with field 'localformat=row'.

Comment on lines +100 to +105
DefaultPolicies execution order:
1. SystemColumnPolicy — split system fields (RowID, Timestamp) + PK
2. AvgSizePolicy — split large fields by avg size
3. SelectedDataTypePolicy — split vector / text fields (each gets own group)
4. LocalFormatPolicy — split vortex fields into separate group
5. RemanentShortPolicy — merge remaining short fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we already have?

Copy link
Contributor Author

@aoiasd aoiasd Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we already have?

Yes, Only add LocalFormatPolicy here.


**VortexGroupChunkTranslator construction:**

1. Download vortex files from object storage into Arrow Buffers (in-memory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about lazy load?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants