[refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids#62981
[refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids#62981foxtail463 wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
minimal reproducible case:DROP DATABASE IF EXISTS mv_id_conflict_demo; SET enable_nereids_planner = true; CREATE TABLE fact_src ( CREATE TABLE dim_full ( CREATE VIEW v_dim_full_non_double AS INSERT INTO fact_src VALUES ('2026-02-04', 'K1', '0', '1'); DROP MATERIALIZED VIEW IF EXISTS mv_fact; DROP MATERIALIZED VIEW IF EXISTS mv_dim_full; DROP MATERIALIZED VIEW IF EXISTS mv_dim_full_view_non_double; DROP MATERIALIZED VIEW IF EXISTS mv_target; REFRESH MATERIALIZED VIEW mv_fact COMPLETE; EXPLAIN |
d1b5a73 to
a620789
Compare
|
/review |
Performance EvaluationThis benchmark checks whether the new StructInfo candidate lookup adds visible MV rewrite overhead. It compares the current patch The benchmark SQL models a nested MV rewrite case with base tables, a view, child MVs, and a parent MV. The query starts from base tables and a view, while the target MV is defined over child MVs. This shape exercises the nested MV rewrite path and the StructInfo candidate lookup changed by this PR. Example:
The benchmark has three scales:
All three scales assert that the target MV is chosen. End-to-end EXPLAIN benchmark
|
What problem does this PR solve?
Problem Summary:
Nested MV rewrite needs to distinguish two different identities during fuzzy
StructInfo collection:
In this shape, child rewrite can first introduce MV scan relations into memo.
Then the parent group should be able to build a candidate plan from those MV
scan relations and match mv_target.
The old StructInfo candidate path used the table/common-table-id based cache key
in StructInfoMap's candidate map to organize memo candidates. That key only
describes the table family covered by one MV definition; it is a search-space
key, not the identity of a concrete candidate. The exact candidate identity is
relationIdSet, which describes the relations contained by one memo candidate plan
tree.
In the example above, the rewritten scan candidate for mv_dim_full and the
rewritten scan candidate for mv_dim_full_view_non_double can fall into the same
table/common-table-id cache key while representing different relationIdSet
values. If one candidate overwrites or is reused as the other, the parent
mv_target candidate is assembled with the wrong child relation, so the final
target MV rewrite becomes path-sensitive and may fail.
This refactor makes the identity boundary explicit:
the inner key
nested MV scan relations
This keeps base-table, view-derived, and rewritten MV-scan candidates coexisting
under the same coarse table family without overwriting each other.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)