DeltaLake support optimize output metrics in result#29040
DeltaLake support optimize output metrics in result#29040Max-Cheng wants to merge 2 commits intotrinodb:masterfrom
optimize output metrics in result#29040Conversation
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Ray Cheng.
|
optimzie output metrics in resultoptimize output metrics in result
ae8ceb4 to
1bbdd9b
Compare
1bbdd9b to
c6e9dfd
Compare
|
@ebyhr OMG your response speed is so fast! |
c6e9dfd to
c544c15
Compare
c544c15 to
1eebb67
Compare
| } | ||
|
|
||
| private void finishOptimize(ConnectorSession session, DeltaLakeTableExecuteHandle executeHandle, Collection<Slice> fragments, List<Object> splitSourceInfo) | ||
| private Map<String, Long> finishOptimize(ConnectorSession session, DeltaLakeTableExecuteHandle executeHandle, Collection<Slice> fragments, List<Object> splitSourceInfo) |
There was a problem hiding this comment.
nit: I'd recommend rather returning OptimizeResult and doing the toMap in the calling context instead.
If you take the suggestion, pls do a preparatory commit as well for IcebergMetadata
There was a problem hiding this comment.
Let's avoid touching unrelated modules in this PR. There is no need to modify IcebergMetadata in my opinion.
| for (int i = 0; i < 3; i++) { | ||
| Set<String> initialFiles = getActiveFiles(tableName); | ||
| computeActual("ALTER TABLE " + tableName + " EXECUTE OPTIMIZE"); | ||
| computeActual(withSingleWriterPerTask(getSession()), "ALTER TABLE " + tableName + " EXECUTE OPTIMIZE"); |
There was a problem hiding this comment.
Let's rather do a preparatory commit that makes use of withSingleWriterPerTask to explain that we're aiming for stable results.
There was a problem hiding this comment.
but, it seems an un-related to the change, is it required by the current(pr's) change?
There was a problem hiding this comment.
@chenjian2664 @findinpath yep, this is an un-related change: to avoid flaky test results, I’ll split these commits into separate ones.
| metric_name | metric_value | ||
| ----------------------------+-------------- | ||
| rewritten_data_files_count | 2 | ||
| removed_delete_files_count | 0 |
There was a problem hiding this comment.
We can remove it if it is guaranteed to always be 0
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
| for (int i = 0; i < 3; i++) { | ||
| Set<String> initialFiles = getActiveFiles(tableName); | ||
| computeActual("ALTER TABLE " + tableName + " EXECUTE OPTIMIZE"); | ||
| computeActual(withSingleWriterPerTask(getSession()), "ALTER TABLE " + tableName + " EXECUTE OPTIMIZE"); |
There was a problem hiding this comment.
but, it seems an un-related to the change, is it required by the current(pr's) change?
| MaterializedResult optimizeResult = computeActual(withSingleWriterPerTask(getSession()), "ALTER TABLE " + tableName + " EXECUTE OPTIMIZE"); | ||
|
|
||
| Map<String, Long> metrics = optimizeResult.getMaterializedRows().stream() | ||
| .collect(ImmutableMap.toImmutableMap( |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
1eebb67 to
3abfa15
Compare
3abfa15 to
5665182
Compare
| return new OptimizeResult(scannedDataFiles.size(), filesToDelete.size(), dataFileInfos.size()); | ||
| } | ||
|
|
||
| private record OptimizeResult(long rewrittenDataFiles, long removedDeleteFiles, long addedDataFiles) |
There was a problem hiding this comment.
We don't need this record in this connector. Please remove.
Iceberg connector uses a record class because there are 2 return in finishOptimize method.
| } | ||
|
|
||
| @Test | ||
| public void testOptimizeWithDeletionVectors() |
There was a problem hiding this comment.
BaseDeltaLakeConnectorSmokeTest is used for running tests on several storages. Please consider reverting changes in this class, and updating TestDeltaLakeConnectorTest instead.
Description
Allow distributed table procedures to output metrics in result.
DeltaLake OPTIMIZE procedure returns the output below:
OPTIMIZEcommand in Delta Lake #28999Release notes