Skip to content

[branch-4.1][feature](variant) add variant doc mode (#59183)#61281

Open
csun5285 wants to merge 4 commits intoapache:branch-4.1from
csun5285:branch-4.1
Open

[branch-4.1][feature](variant) add variant doc mode (#59183)#61281
csun5285 wants to merge 4 commits intoapache:branch-4.1from
csun5285:branch-4.1

Conversation

@csun5285
Copy link
Contributor

pick from master #59183, #60511, #60518, #60668

csun5285 and others added 3 commits March 12, 2026 17:36
**DOC encoding**: materialize all subcolumns, and additionally keep the
original JSON as a “stored field” so queries can quickly return the
entire JSON document.

doc: apache/doris-website#3253
…hmarks (apache#60511)

Improve VARIANT doc-mode ingestion and compaction performance, and add a
write/compaction benchmark UT.

Key changes:
- Add config `enable_variant_doc_compaction_sparse_write` (default:
true) to
enable sparse writing for materialized subcolumns during doc-mode
compaction.
- Optimize doc value column generation: deduplicate paths, sort paths
once, and
serialize values directly into the binary column to reduce
allocations/copies.
- Optimize bucket sharding/stats collection with phmap and cached
path->bucket
  mapping; remove per-row doc value sorting in writer.
- Refactor VariantDocCompactWriter finalize flow with an explicit write
plan;
  skip invalid array types and fix doc materialization offset handling.
- Update variant-related UTs and add VariantDocModeCompactionTest to
benchmark
  segment import and doc-mode compaction.

UT Testing:
- BE UT: VariantColumnWriterReaderTest.*, VariantUtilTest.*,
ColumnVariantTest.*

Benchmark Testing(Release compile):
- BE UT (Release-only, slow): VariantDocModeCompactionTest.*

<img width="902" height="778" alt="image"
src="https://github.com/user-attachments/assets/08fb9341-2620-484d-9cf9-f3a64c95d4bd"
/>
@csun5285 csun5285 requested a review from yiguolei as a code owner March 12, 2026 11:22
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285
Copy link
Contributor Author

run buildall

@csun5285
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.15% (1788/2259)
Line Coverage 64.50% (31945/49530)
Region Coverage 65.29% (15976/24468)
Branch Coverage 55.84% (8495/15214)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 33.57% (47/140) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants