Skip to content

[Kernel] Implement catalogManaged table feature in KernelΒ #4573

@scottsand-db

Description

@scottsand-db

Overview

Please see #4381

Design

[Public] [External] Design Doc: Delta Kernel <> catalogManaged Tables

Project Tracking

Merged = βœ…
Needs Review = πŸ‘€
Waiting for merge = β˜‘
Abandoned = πŸ›‘

Milestone 0.1: Kernel-side CCv2 reads -- MVP βœ… , Productionization πŸ› οΈ

Description PR Status Created Merged
ParsedLogData #4579 βœ… 05/19 05/21
ParsedCheckpointData ordering #4597 βœ… 05/21 05/27
APIs - TableManager, ResolvedTableBuilder, ResolvedTable #4614 βœ… 05/22 05/28
ResolvedTable impl; Builder impl; Factory impl; Include parsedLogData when constructing LogSegment #4615 βœ… 05/22 06/02
Load the protocol, metadata, and LogSegment only as needed #4644 βœ… 05/27 06/03
Refactor: Make LogReplay load P&M lazily, without impacting any existing code paths today #4641 βœ… 05/27 05/27
Refactor: Refactor SnapshotQueryContext error reporting to instance method #4654 βœ… 05/28 05/28
Refactor: Move testMetadata creation helper to separate test trait #4657 βœ… 05/28 05/28
ResolvedTableBuilder input validation #4664 βœ… 05/29 06/16
@mmmyr -- #4639 -- Move static util assertLogFilesBelongToTable into LogSegment constructor validation #4682 βœ… 05/30 06/02
Refactor: Make AbstractTestUtils so we can run tests using both old and new APIs #4676 βœ… 05/30 06/14
Refactor: Make LogReplay take in a lazy LogSegment #4690 βœ… 06/02 06/03
catalogManaged preview table feature support #4686 βœ… 06/02 06/14
Fix deltas + commits merging logic to favor the ratified commits #4768 βœ… 06/13 06/18
Create ScanBuilder from ResolvedTable; add E2E ResolvedTable read tests #4663 βœ… 06/16 06/18
Simple E2E read suite with real table and real staged commits #4761 βœ… 06/16 06/18
SnapshotBuilder read metrics #5030 βœ… 08/05 08/06
#5034 -- SnapshotBuilder::atTimestamp API #5145 βœ… 08/29 08/29
Mock the P & M in unit tests N/A TODO xx xx
Support a table of only log data N/A TODO xx xx
If table constructed with ratified staged commit log data, verify table feature catalogManaged is supported N/A TODO xx xx

Milestone 0.2: E2E reads on UC managed tables -- MVP βœ… , Productionization πŸ› οΈ

Description PR Status Created Merged
Simple UCCatalogManagedClient with UCCatalogManagedClientSuite #4780 βœ… 06/17 06/25
Simple InMemoryUCClient with InMemoryUCClientSuite #4835 βœ… 06/17 06/27
UCCatalogManagedClient loadTable tests #4838 βœ… 06/26 06/30
GitHub workflow for untiy tests #4857 βœ… 06/30 06/30
UCCatalogManagedClient: minor fixes (e.g. table version is optional) #4944 βœ… 07/17 07/29
TODO: Create runtime UCCatalogManagedException class xx xx xx xx

Milestone 0.3: E2E basic writes -- MVP βœ… , Productionization πŸ› οΈ

Description PR Status Created Merged
New Transaction and Committer etc. APIs #4814 βœ… 06/24 07/14
Basic TransactionV2 and CommitContext impementation #4916 βœ… 07/11 07/29
Set the committer on ResolvedTableBuilder; No-op DefaultCommitter #4936 βœ… 07/15 07/29
Refactor: Remove ResolvedTable #5003 βœ… 08/01 08/01
Implement the default committer #5040 βœ… 08/07 08/07
Refactor: optional readSnapshot in TransactionImpl #5012 βœ… 08/01 08/04
Refactor: expose clustering cols on SnapshotImpl #5019 βœ… 08/04 08/04
Refactor: common transaction builder code #5033 βœ… 08/06 08/07
Intent based builders: Create and Update #5044 βœ… 08/07 08/13
Have the txn invoke the committer and handle CommitFailedException #5042 βœ… 08/07 08/11
Add helpful APIs to CommitMetadata #5060 βœ… 08/13 08/13
Refactor: Timing Operations #5071 βœ… 08/14 08/14
Create _staged_commits directory #5080 βœ… 08/14 08/18
Refactor: Make P&M in CommitMetadata coupled #5083 βœ… 08/15 08/18
CatalogManaged table feature write support #5088 βœ… 08/15 08/18
#5021: New FileSystemClient::getFileStatus API #5113 βœ… 08/19 08/20
LogSegment input validation refactor #5148 βœ… 08/31 09/02
Prefactor for #5100: FIx ICT disablement #5156 βœ… 09/04 09/05
Explicitly block enabling catalogManaged during REPLACE #5100 βœ… 08/18 09/09
Fix Snapshot::getTimestamp to use CommitInfo from LogSegment #5149 βœ… 08/31 xx
Prefactor: TestFixtures trait and convenience createCommitMetadata API #5174 βœ… 09/08 09/08
Update IcebergWriterCompat to support catalogManaged #5176 βœ… 09/08 09/12
Utility to convert physical column -> logical column #5258 βœ… 09/25 09/26
Post-Commit-Snapshot-1: LogSegment incremental update #5262 βœ… 09/26 10/01
Remove SnapshotHint; Fix CRCInfo caching bug in LogReplay; Simplify LogReplay #5276 βœ… 09/30 09/30
ParsedCatalogCommitData #5279 βœ… 10/01 10/03
Refactor P & M replay into static util #5292 βœ… 10/01 10/06
FileSytemClient::copyFileAtomically #5289 βœ… 10/03 10/06
LogSegment::maxPublishedDeltaVersion #5312 βœ… 10/07 10/08
CatalogCommitter::getRequiredTableProperties #5322 βœ… 10/09 10/13
Snapshot::publish #5332 βœ… 10/13 10/14
PostCommitSnapshot #5309 βœ… 10/07 10/08
SnapshotStatistics and Stapshot write CRC file API #5340 βœ… 10/15 10/17
Test: load snapshot using CRC from unpublishe delta version #5372 πŸ‘€ 10/20 xx

Milestone 0.4: E2E writes on UC managed tables -- MVP βœ… , Productionization πŸ› οΈ

Description PR Status Created Merged
UCCatalogManagedCommitter: skeleton #5062 βœ… 08/13 08/14
UCCataogManagedCommitter: e2e implementation #5074 βœ… 08/14 08/20
UCCatalogManagedClient CREATE support #5121 βœ… 08/20 08/26
UCCatalogManagedClient::createTxnBuilder API #5146 βœ… 08/30 09/05
UCCatalogManagedCommitter uses FileSystemClient::getFIleStatus #5168 βœ… 09/05 09/08
Convert Kernel P&M types to io.delta.storage.commit types #5097 βœ… 08/18 08/18
Send P&M to UC inside commit #5185 βœ… 09/09 09/09
Allow connectors to inject custom committer properties #5240 βœ… 09/23 09/23
Better UC CCv2 Client extensibility #5242 βœ… 09/23 09/23
Add CatalogCommitterUtils::extractProtocolProperties utility #5243 βœ… 09/23 09/25
Add Committer::publish API and UC implementation #5297 βœ… 10/06 xx
Expose the maxKnownPublishedDeltaVersion to the UC committer #5337 βœ… 10/15 10/16
UC commit metrics #5362 πŸ‘€ 10/20 xx
TODO: UC publish metrics xx xx xx xx

Followups / TODOs

Description PR Status Created Merged
[Refactor] ParsedLogData refactor (remove isMaterialized, remove enums) #4805 πŸ‘€ 06/23 xx
#5164 -- Kernel must respect maxRatifiedVersion returned by catalog xx xx xx xx
#5147 -- Fix ICT/CommitInfo published delta version assumptions xx xx xx xx
[Docs] Better class/method docs for TableManager, ResolvedTableBuilder, ResolvedTable #4822 βœ… 06/24 06/25
Implement ResolvedTable::getTimestamp (refactor existing ICT utilities) xx xx xx xx
#4908: Include conflicting, and winning, catalog ratified commits in CommitFailedException xx xx xx xx
#4816: Public ParsedLogData API xx xx xx xx
#4820: Public Protocol API xx xx xx xx
#4821: Public Metadata API xx xx xx xx
#4817: Update UCCatalogManagedClient::loadTable to take in the TableInfo result from UC xx xx xx xx
#4763 -- Support official catalogManaged table feature name when RFC accepted xx xx xx xx
#4764 -- Update CatalogManagedEnablementSuite to use new APIs xx xx xx xx
#4765 -- Update ResolvedTableBuilder to accept other types of ParsedLogDatas, not only Staged Ratified Commits xx xx xx xx
#4770 -- Implement getVersionBeforeOrAtTimestamp (etc.) APIs xx xx xx xx
#4787 -- Refactor SnapshotManager::getLogSegmentForVersion xx xx xx xx
#5018 -- Fix conflict resolution version rebasing xx xx xx xx
Support Snapshot::checkpoint + update tests to use both paths xx xx xx xx

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions