-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(optimizer): Add support for configurable freshness thresholds for materialized views #26764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdds configurable staleness handling for materialized views, including new SPI types, config and session properties, connector- and engine-level plumbing (notably for Iceberg and memory connectors), optimizer changes to respect staleness behavior and window, and tests/docs for the new behavior. Sequence diagram for materialized view rewrite with staleness handlingsequenceDiagram
actor User
participant Session
participant Planner as MaterializedViewRewrite
participant Metadata as MetadataResolver
participant Connector as ConnectorMetadata
User->>Session: submit query referencing MV
Session->>Planner: optimize plan
Planner->>Metadata: getMaterializedView(viewName)
Metadata->>Connector: getMaterializedView(viewName)
Connector-->>Metadata: MaterializedViewDefinition(definition)
Metadata-->>Planner: MaterializedViewDefinition(definition)
Planner->>Metadata: getMaterializedViewStatus(viewName,TupleDomain.all)
Metadata->>Connector: getMaterializedViewStatus(viewName,TupleDomain.all)
Connector-->>Metadata: MaterializedViewStatus(status,lastFreshTime)
Metadata-->>Planner: MaterializedViewStatus(status,lastFreshTime)
alt fully materialized
Planner->>Planner: canUseDataTableWithSecurityChecks(definition,status)
Planner-->>Session: use data table
else not fully materialized and stalenessConfig present
Planner->>Planner: isStalenessBeyondTolerance(stalenessConfig,status)
alt within stalenessWindow
Planner->>Planner: canUseDataTableWithSecurityChecks(definition,status)
Planner-->>Session: use data table
else beyond stalenessWindow
alt staleReadBehavior == FAIL
Planner-->>User: error MATERIALIZED_VIEW_STALE
else staleReadBehavior == USE_VIEW_QUERY
Planner-->>Session: do not use data table
end
end
else no stalenessConfig
Planner->>Session: getMaterializedViewStaleReadBehavior
alt sessionBehavior == USE_VIEW_QUERY
Planner-->>Session: do not use data table
else sessionBehavior == FAIL
Planner-->>User: error MATERIALIZED_VIEW_STALE
end
end
Updated class diagram for materialized view staleness modelclassDiagram
class MaterializedViewDefinition {
- String originalSql
- String schema
- String table
- List~SchemaTableName~ baseTables
- Optional~String~ owner
- Optional~ViewSecurity~ securityMode
- List~ColumnMapping~ columnMappings
- List~SchemaTableName~ baseTablesOnOuterJoinSide
- Optional~List~ validRefreshColumns
- Optional~MaterializedViewStalenessConfig~ stalenessConfig
- Optional~MaterializedViewRefreshType~ refreshType
+ MaterializedViewDefinition(originalSql,schema,table,baseTables,owner,securityMode,columnMappings,baseTablesOnOuterJoinSide,validRefreshColumns,stalenessConfig,refreshType)
+ Optional~MaterializedViewStalenessConfig~ getStalenessConfig()
+ Optional~MaterializedViewRefreshType~ getRefreshType()
}
class MaterializedViewStalenessConfig {
- MaterializedViewStaleReadBehavior staleReadBehavior
- Duration stalenessWindow
+ MaterializedViewStalenessConfig(staleReadBehavior,stalenessWindow)
+ MaterializedViewStaleReadBehavior getStaleReadBehavior()
+ Duration getStalenessWindow()
}
class MaterializedViewStatus {
- MaterializedViewState materializedViewState
- Map~SchemaTableName,MaterializedDataPredicates~ partitionsFromBaseTables
- Optional~Long~ lastFreshTime
+ MaterializedViewStatus(materializedViewState)
+ MaterializedViewStatus(materializedViewState,partitionsFromBaseTables)
+ MaterializedViewStatus(materializedViewState,partitionsFromBaseTables,lastFreshTime)
+ MaterializedViewState getMaterializedViewState()
+ Map~SchemaTableName,MaterializedDataPredicates~ getPartitionsFromBaseTables()
+ Optional~Long~ getLastFreshTime()
}
class MaterializedViewStaleReadBehavior {
<<enum>>
FAIL
USE_VIEW_QUERY
}
class MaterializedViewRefreshType {
<<enum>>
FULL
}
class FeaturesConfig {
- MaterializedViewStaleReadBehavior materializedViewStaleReadBehavior
+ MaterializedViewStaleReadBehavior getMaterializedViewStaleReadBehavior()
+ FeaturesConfig setMaterializedViewStaleReadBehavior(materializedViewStaleReadBehavior)
}
class SystemSessionProperties {
+ static MaterializedViewStaleReadBehavior getMaterializedViewStaleReadBehavior(Session)
}
class IcebergTableProperties {
+ static Optional~MaterializedViewStaleReadBehavior~ getMaterializedViewStaleReadBehavior(Map)
+ static Optional~Duration~ getMaterializedViewStalenessWindow(Map)
+ static MaterializedViewRefreshType getMaterializedViewRefreshType(Map)
+ List~PropertyMetadata~ getMaterializedViewProperties()
}
class MaterializedViewRewrite {
- boolean isUseDataTable(MaterializedViewScanNode,MetadataResolver,Session)
- boolean isStalenessBeyondTolerance(MaterializedViewStalenessConfig,MaterializedViewStatus)
- boolean applyStaleReadBehavior(MaterializedViewStalenessConfig,QualifiedObjectName)
- boolean canUseDataTableWithSecurityChecks(MaterializedViewScanNode,MetadataResolver,Session,MaterializedViewDefinition)
}
MaterializedViewDefinition --> "0..1" MaterializedViewStalenessConfig : stalenessConfig
MaterializedViewDefinition --> "0..1" MaterializedViewRefreshType : refreshType
MaterializedViewStalenessConfig --> MaterializedViewStaleReadBehavior : uses
MaterializedViewStatus --> "0..1" Long : lastFreshTime
FeaturesConfig --> MaterializedViewStaleReadBehavior : defaultBehavior
SystemSessionProperties --> MaterializedViewStaleReadBehavior : sessionBehavior
IcebergTableProperties --> MaterializedViewStaleReadBehavior : tableProperty
IcebergTableProperties --> MaterializedViewRefreshType : tableProperty
MaterializedViewRewrite --> MaterializedViewDefinition : reads
MaterializedViewRewrite --> MaterializedViewStatus : reads
MaterializedViewRewrite --> MaterializedViewStalenessConfig : evaluates
MaterializedViewRewrite --> MaterializedViewStaleReadBehavior : applies
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
cd250c1 to
d9f4b9a
Compare
d9f4b9a to
aefcddf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The staleness evaluation in MaterializedViewRewrite uses currentTimeMillis() directly, which makes behavior time-dependent and slightly non-deterministic in tests; consider injecting a clock or passing the evaluation time via MaterializedViewStatus to make the logic and tests fully deterministic.
- Several new tests in TestMaterializedViewRewrite appear to cover overlapping scenarios (e.g., the two 'within tolerance' and two 'beyond tolerance' cases); you could consolidate or parameterize these to reduce duplication while keeping the coverage.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The staleness evaluation in MaterializedViewRewrite uses currentTimeMillis() directly, which makes behavior time-dependent and slightly non-deterministic in tests; consider injecting a clock or passing the evaluation time via MaterializedViewStatus to make the logic and tests fully deterministic.
- Several new tests in TestMaterializedViewRewrite appear to cover overlapping scenarios (e.g., the two 'within tolerance' and two 'beyond tolerance' cases); you could consolidate or parameterize these to reduce duplication while keeping the coverage.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java:1718-1723` </location>
<code_context>
return new MaterializedViewStatus(NOT_MATERIALIZED, ImmutableMap.of());
}
+ Optional<Long> lastFreshTime = definition.get().getBaseTables().stream()
+ .map(baseTable -> getIcebergTable(session, baseTable))
+ .map(Table::currentSnapshot)
+ .filter(Objects::nonNull)
+ .map(Snapshot::timestampMillis)
+ .max(Long::compareTo);
+
for (SchemaTableName baseTable : definition.get().getBaseTables()) {
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid recomputing Iceberg tables twice when determining status and last fresh time.
`lastFreshTime` is computed by calling `getIcebergTable(session, baseTable)` for every base table, and then the `for` loop calls `getIcebergTable` again for the same tables. This doubles catalog/metadata access. Either derive `lastFreshTime` from the `baseIcebergTable` already fetched in the loop, or build a single map of base table → snapshot and reuse it in both places to avoid repeated lookups.
Suggested implementation:
```java
Map<SchemaTableName, Table> baseIcebergTables = new HashMap<>();
for (SchemaTableName baseTable : definition.get().getBaseTables()) {
baseIcebergTables.put(baseTable, getIcebergTable(session, baseTable));
}
Optional<Long> lastFreshTime = baseIcebergTables.values().stream()
.map(Table::currentSnapshot)
.filter(Objects::nonNull)
.map(Snapshot::timestampMillis)
.max(Long::compareTo);
for (SchemaTableName baseTable : definition.get().getBaseTables()) {
Table baseIcebergTable = baseIcebergTables.get(baseTable);
```
1. Ensure the following imports exist at the top of `IcebergAbstractMetadata.java` (or equivalent, depending on your existing imports):
- `import java.util.HashMap;`
- `import java.util.Map;`
2. If the file already has these imports under a different grouping/order, integrate them according to your existing import style.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java:225-234` </location>
<code_context>
+ assertThrows(PrestoException.class, () ->
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen failure tests by asserting on error code/message instead of only exception type
In the failure tests (`testFailWhenStaleAndSessionPropertyIsFail`, `testFailWhenStalenessBeyondToleranceWithFailBehavior`, `testFailWhenNeverRefreshedWithStalenessConfig`, `testFailWhenStalenessBeyondToleranceWindow`), consider using the `assertThrows` overload that returns the exception and then assert on `getErrorCode()` (and optionally the message) so the tests verify the specific `MATERIALIZED_VIEW_STALE` / `INVALID_MATERIALIZED_VIEW_PROPERTY` failures, not just the exception type.
Suggested implementation:
```java
Metadata metadata = new TestingMetadataWithMaterializedViewStatus(false);
PrestoException exception = assertThrows(PrestoException.class, () ->
testerWithFail.assertThat(new MaterializedViewRewrite(metadata, new AllowAllAccessControl()))
.on(planBuilder -> {
VariableReferenceExpression outputA = planBuilder.variable("a", BIGINT);
VariableReferenceExpression dataTableA = planBuilder.variable("data_table_a", BIGINT);
VariableReferenceExpression viewQueryA = planBuilder.variable("view_query_a", BIGINT);
return planBuilder.materializedViewScan(
materializedViewName,
planBuilder.values(dataTableA),
planBuilder.values(viewQueryA),
```
1. After this block (in the same test method), add:
`assertEquals(MATERIALIZED_VIEW_STALE.toErrorCode(), exception.getErrorCode());`
(or `INVALID_MATERIALIZED_VIEW_PROPERTY.toErrorCode()` where appropriate).
2. Ensure you have `import static org.testng.Assert.assertEquals;` at the top of the file.
3. Ensure you have static imports for the error codes you want to assert, e.g.:
`import static com.facebook.presto.spi.StandardErrorCode.MATERIALIZED_VIEW_STALE;`
and/or
`import static com.facebook.presto.spi.StandardErrorCode.INVALID_MATERIALIZED_VIEW_PROPERTY;`.
4. Apply the same pattern to:
- `testFailWhenStaleAndSessionPropertyIsFail`
- `testFailWhenStalenessBeyondToleranceWithFailBehavior`
- `testFailWhenNeverRefreshedWithStalenessConfig`
- `testFailWhenStalenessBeyondToleranceWindow`
by:
- Capturing the thrown `PrestoException` into a local variable.
- Asserting `getErrorCode()` matches the specific expected error code.
- Optionally asserting on `getMessage()` if you want stronger guarantees about the failure reason.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/test/java/com/facebook/presto/execution/TestCreateMaterializedViewTask.java:128-133` </location>
<code_context>
ColumnPropertyManager columnPropertyManager = new ColumnPropertyManager();
columnPropertyManager.addProperties(testCatalog.getConnectorId(), ImmutableList.of());
+ MaterializedViewPropertyManager materializedViewPropertyManager = new MaterializedViewPropertyManager();
+ materializedViewPropertyManager.addProperties(testCatalog.getConnectorId(), ImmutableList.of());
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that exercises materialized view creation with the new MV properties
`MaterializedViewPropertyManager` is now wired into `CreateMaterializedViewTask`, but tests here still only validate table properties. Please add a test that creates a materialized view with MV-specific properties (e.g., `materialized_view_stale_read_behavior`, `materialized_view_staleness_window`, `materialized_view_refresh_type`) set in SQL, and asserts they are correctly propagated into `ConnectorTableMetadata` (or the stored definition).
</issue_to_address>
### Comment 4
<location> `presto-docs/src/main/sphinx/admin/materialized-views.rst:66` </location>
<code_context>
+
+When no per-view configuration is specified, the default behavior is ``FAIL`` (query fails with
+an error). This can be changed using the ``materialized_view_stale_read_behavior`` session property
+or the ``materialized-view-stale-read-behavior`` configuration property. Setting to ``USE_VIEW_QUERY``
+causes Presto to fall back to executing the underlying view query against the base tables.
+
Required Permissions
</code_context>
<issue_to_address>
**suggestion (typo):** Add a missing word in "Setting to ``USE_VIEW_QUERY``` for correct grammar.
The phrase is missing an object. Please rephrase to something like “Setting it to ``USE_VIEW_QUERY```” or “Setting this to ``USE_VIEW_QUERY```” for correct grammar.
```suggestion
or the ``materialized-view-stale-read-behavior`` configuration property. Setting it to ``USE_VIEW_QUERY``
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
...rc/test/java/com/facebook/presto/sql/planner/iterative/rule/TestMaterializedViewRewrite.java
Outdated
Show resolved
Hide resolved
...to-main-base/src/test/java/com/facebook/presto/execution/TestCreateMaterializedViewTask.java
Show resolved
Hide resolved
6028a0b to
20869b1
Compare
…m view and table properties
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan thanks for this feature, overall lgtm. I've left a few comments—mostly small things.
| Optional<Long> lastFreshTime = baseIcebergTables.values().stream() | ||
| .map(Table::currentSnapshot) | ||
| .filter(Objects::nonNull) | ||
| .map(Snapshot::timestampMillis) | ||
| .max(Long::compareTo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here appears to obtain the timestamp of the latest snapshot for all base tables. Are we supposed to use the timestamps of the base tables' snapshots that are recorded in the MV's properties?
| private final SessionPropertyManager sessionPropertyManager; | ||
| private final SchemaPropertyManager schemaPropertyManager; | ||
| private final TablePropertyManager tablePropertyManager; | ||
| private final MaterializedViewPropertyManager materializedViewPropertyManager = new MaterializedViewPropertyManager(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick question: was it intentional that materializedViewPropertyManager is not handled via injection like the other property managers?
| false)) | ||
| .build(); | ||
|
|
||
| List<PropertyMetadata<?>> mvOnlyProperties = ImmutableList.<PropertyMetadata<?>>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the clean separation of MV properties here—makes things much clearer. I was wondering if it would be possible to move these MV properties to a dedicated class?
Description
Add configurable freshness thresholds for materialized views. Users can specify a staleness window and behavior (
FAILorUSE_VIEW_QUERY) when the view is stale beyond the threshold.For Iceberg, materialized view properties are added which allow toggling both the stale threshold and the behavior:
CREATE MATERIALIZED VIEW hourly_sales WITH ( stale_read_behavior = 'FAIL', staleness_window = '1h' ) AS SELECT date_trunc('hour', sale_time) as hour, SUM(amount) as total FROM sales GROUP BY 1;Motivation and Context
Materialized views can become stale when base tables change. This feature allows users to configure how Presto handles stale reads - either fail with an error or fall back to querying base tables directly.
Impact
Support added for configurable freshness thresholds
Test Plan
FAIL/USE_VIEW_QUERYbehaviorsContributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.