-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
type/bugSomething isn't workingSomething isn't working
Description
We have a large table (over 250TB) partitioned by hour (ts column).
The ts column contains only hours and has full iceberg statistics over it, which have been verified in Iceberg metadata.
However, when we perform a simple query to retrieve the maximum and minimum values of the ts column, I observe that the query profile shows a full scan of the ts column instead of checking the Iceberg metadata for the statistics.
We have tried to apply following parameters but still full scan performed
enable_rewrite_simple_agg_to_hdfs_scan=true,
enable_rewrite_simple_agg_to_meta_scan=true
Steps to reproduce the behavior (Required)
SELECT
/*+ SET_VAR(
enable_rewrite_simple_agg_to_hdfs_scan=true,
enable_rewrite_simple_agg_to_meta_scan = true)
*/
min(ts), max(ts)
FROM our_schema.iceberg_table;
Expected behavior (Required)
Queries over Iceberg should use Iceberg metadata to avoid costly scan of the data
Real behavior (Required)
Result in 1TB scan and execution time of ~4m on our, ~640 cores cluster.
StarRocks version (Required)
4.0.2-1f1aa9c
Query Profile attached:
...
on-default:
- size: 567
- columns: {..., ts=full, ...}
...
- HdfsIOMetrics:
- TotalBytesRead: 967.851 GB
...
Metadata
Metadata
Assignees
Labels
type/bugSomething isn't workingSomething isn't working