Skip to content

[refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids#62981

Open
foxtail463 wants to merge 1 commit intoapache:masterfrom
foxtail463:refactor-mv-structinfo-candidate-key
Open

[refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids#62981
foxtail463 wants to merge 1 commit intoapache:masterfrom
foxtail463:refactor-mv-structinfo-candidate-key

Conversation

@foxtail463
Copy link
Copy Markdown
Contributor

@foxtail463 foxtail463 commented Apr 30, 2026

What problem does this PR solve?

Problem Summary:

Nested MV rewrite needs to distinguish two different identities during fuzzy
StructInfo collection:

-- Query starts from base tables and a view over dim_full.
SELECT ...
FROM fact_src t
LEFT JOIN dim_full d0 ...
LEFT JOIN v_dim_full_non_double d1 ...

-- The target MV is defined over child MVs.
SELECT ...
FROM mv_fact t
LEFT JOIN mv_dim_full d0 ...
LEFT JOIN mv_dim_full_view_non_double d1 ...

In this shape, child rewrite can first introduce MV scan relations into memo.
Then the parent group should be able to build a candidate plan from those MV
scan relations and match mv_target.

The old StructInfo candidate path used the table/common-table-id based cache key
in StructInfoMap's candidate map to organize memo candidates. That key only
describes the table family covered by one MV definition; it is a search-space
key, not the identity of a concrete candidate. The exact candidate identity is
relationIdSet, which describes the relations contained by one memo candidate plan
tree.

In the example above, the rewritten scan candidate for mv_dim_full and the
rewritten scan candidate for mv_dim_full_view_non_double can fall into the same
table/common-table-id cache key while representing different relationIdSet
values. If one candidate overwrites or is reused as the other, the parent
mv_target candidate is assembled with the wrong child relation, so the final
target MV rewrite becomes path-sensitive and may fail.

This refactor makes the identity boundary explicit:

  • use table ids only to expand the relation search space for an MV
  • use exact relationIdSet as the StructInfo candidate identity
  • cache candidates by target relation search space, with exact relationIdSet as
    the inner key
  • register tableId -> relationId when catalog relations enter memo, including
    nested MV scan relations
  • clear StructInfoMap candidate caches when relation identity changes
  • keep candidate plan materialization lazy until StructInfo is actually needed

This keeps base-table, view-derived, and rewritten MV-scan candidates coexisting
under the same coarse table family without overwriting each other.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@foxtail463
Copy link
Copy Markdown
Contributor Author

minimal reproducible case:

DROP DATABASE IF EXISTS mv_id_conflict_demo;
CREATE DATABASE mv_id_conflict_demo;
USE mv_id_conflict_demo;

SET enable_nereids_planner = true;
SET enable_fallback_to_original_planner = false;
SET enable_materialized_view_rewrite = true;
SET enable_materialized_view_nest_rewrite = true;
SET enable_nereids_timeout = false;
SET materialized_view_rewrite_duration_threshold_ms = 1800000;

CREATE TABLE fact_src (
dt DATE NOT NULL,
k VARCHAR(32) NOT NULL,
is_dyn VARCHAR(8),
sku_type VARCHAR(8)
) DUPLICATE KEY(dt, k)
PARTITION BY RANGE(dt) (PARTITION p1 VALUES LESS THAN ('2026-02-05'))
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");

CREATE TABLE dim_full (
dt DATE NOT NULL,
k VARCHAR(32) NOT NULL,
sku_type VARCHAR(8),
is_dyn VARCHAR(8),
bu VARCHAR(32),
mode_flag VARCHAR(8),
double_flag VARCHAR(8)
) UNIQUE KEY(dt, k, sku_type, is_dyn)
PARTITION BY RANGE(dt) (PARTITION p1 VALUES LESS THAN ('2026-02-05'))
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1");

CREATE VIEW v_dim_full_non_double AS
SELECT dt, k, mode_flag, sku_type FROM dim_full WHERE double_flag = '0';

INSERT INTO fact_src VALUES ('2026-02-04', 'K1', '0', '1');
INSERT INTO dim_full VALUES ('2026-02-04', 'K1', '1', '0', 'D1', 'M', '0');

DROP MATERIALIZED VIEW IF EXISTS mv_fact;
CREATE MATERIALIZED VIEW mv_fact
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT dt, k, is_dyn, sku_type
FROM fact_src
WHERE sku_type = '1';

DROP MATERIALIZED VIEW IF EXISTS mv_dim_full;
CREATE MATERIALIZED VIEW mv_dim_full
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT dt, k, bu, is_dyn, sku_type
FROM dim_full;

DROP MATERIALIZED VIEW IF EXISTS mv_dim_full_view_non_double;
CREATE MATERIALIZED VIEW mv_dim_full_view_non_double
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT dt, k, mode_flag, sku_type
FROM v_dim_full_non_double;

DROP MATERIALIZED VIEW IF EXISTS mv_target;
CREATE MATERIALIZED VIEW mv_target
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
PARTITION BY (dt)
DISTRIBUTED BY HASH(k) BUCKETS 1
PROPERTIES ("replication_allocation" = "tag.location.default: 1")
AS
SELECT
t.dt,
t.k,
d0.bu AS out_bu,
d1.mode_flag AS out_mode
FROM mv_fact t
LEFT JOIN mv_dim_full d0
ON t.dt = d0.dt
AND t.k = d0.k
AND t.sku_type = d0.sku_type
AND t.is_dyn = d0.is_dyn
LEFT JOIN mv_dim_full_view_non_double d1
ON t.dt = d1.dt
AND t.k = d1.k
AND t.sku_type = d1.sku_type;

REFRESH MATERIALIZED VIEW mv_fact COMPLETE;
REFRESH MATERIALIZED VIEW mv_dim_full COMPLETE;
REFRESH MATERIALIZED VIEW mv_dim_full_view_non_double COMPLETE;
--等几秒
REFRESH MATERIALIZED VIEW mv_target COMPLETE;

EXPLAIN
SELECT
t.dt,
t.k,
d0.bu AS out_bu,
d1.mode_flag AS out_mode
FROM fact_src t
LEFT JOIN dim_full d0
ON t.dt = d0.dt
AND t.k = d0.k
AND t.sku_type = d0.sku_type
AND t.is_dyn = d0.is_dyn
LEFT JOIN v_dim_full_non_double d1
ON t.dt = d1.dt
AND t.k = d1.k
AND t.sku_type = d1.sku_type
WHERE t.dt = '2026-02-04'
AND t.sku_type = '1'
ORDER BY t.k;

@foxtail463 foxtail463 force-pushed the refactor-mv-structinfo-candidate-key branch from d1b5a73 to a620789 Compare April 30, 2026 11:46
@foxtail463 foxtail463 changed the title [refactor](fe) Refactor MV rewrite StructInfo lookup by relation ids [refactor](mv) Refactor MV rewrite StructInfo lookup by relation ids Apr 30, 2026
@foxtail463
Copy link
Copy Markdown
Contributor Author

/review

@foxtail463
Copy link
Copy Markdown
Contributor Author

foxtail463 commented Apr 30, 2026

Performance Evaluation

This benchmark checks whether the new StructInfo candidate lookup adds visible MV rewrite overhead. It compares the current patch 95e91304f729ef6d446973bc0f0d95d923aada51 with doris/master 4e81acee0d83e1db3100a0eb8fe820d05c833c31.

The benchmark SQL models a nested MV rewrite case with base tables, a view, child MVs, and a parent MV. The query starts from base tables and a view, while the target MV is defined over child MVs. This shape exercises the nested MV rewrite path and the StructInfo candidate lookup changed by this PR.

Example:

-- Child MVs.
CREATE MATERIALIZED VIEW mv_fact AS
SELECT dt, k, is_dyn, sku_type
FROM fact_src
WHERE sku_type = '1';

CREATE MATERIALIZED VIEW mv_dim_full AS
SELECT dt, k, bu, is_dyn, sku_type
FROM dim_full;

CREATE MATERIALIZED VIEW mv_dim_full_view_non_double AS
SELECT dt, k, mode_flag, sku_type
FROM v_dim_full_non_double;

-- Parent MV built from child MVs.
CREATE MATERIALIZED VIEW mv_target AS
SELECT
t.dt,
t.k,
d0.bu AS out_bu,
d1.mode_flag AS out_mode
FROM mv_fact t
LEFT JOIN mv_dim_full d0
ON t.dt = d0.dt
AND t.k = d0.k
AND t.sku_type = d0.sku_type
AND t.is_dyn = d0.is_dyn
LEFT JOIN mv_dim_full_view_non_double d1
ON t.dt = d1.dt
AND t.k = d1.k
AND t.sku_type = d1.sku_type;

-- Query starts from the original base table and view.
EXPLAIN
SELECT
t.dt,
t.k,
d0.bu AS out_bu,
d1.mode_flag AS out_mode
FROM fact_src t
LEFT JOIN dim_full d0
ON t.dt = d0.dt
AND t.k = d0.k
AND t.sku_type = d0.sku_type
AND t.is_dyn = d0.is_dyn
LEFT JOIN v_dim_full_non_double d1
ON t.dt = d1.dt
AND t.k = d1.k
AND t.sku_type = d1.sku_type
WHERE t.dt = '2026-02-04'
AND t.sku_type = '1';

The benchmark has three scales:

Scale Shape
ordinary 2 physical tables (fact_src, dim_full), 1 view, 4 MVs; query and parent MV each contain 3 scan refs and 2 joins; current planChars=2176
large 3 physical tables, 3 views, 5 MVs; includes fact-detail/fact-agg/dim-view/partial-join/target MV paths; current planChars=3829
super 11 physical tables, 6 views, 15 MVs in the generated benchmark schema; combines the minimal conflict, alias, multi-stage, nested, and large-MV shapes; current planChars=5525

All three scales assert that the target MV is chosen.

End-to-end EXPLAIN benchmark

scale case baseline steadyAvgMs current steadyAvgMs change
ordinary ordinary_multi_alias_table_join_target_mv_hit 142.256 140.238 -1.4%
large large_multi_alias_table_join_target_mv_hit 285.378 253.697 -11.1%
super super_multi_alias_table_join_target_mv_hit 527.796 533.037 +1.0%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants