-
Couldn't load subscription status.
- Fork 47
Add Delta Lake 4.0 Blog Post #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file added
BIN
+1.96 MB
...lta-site/src/content/blog/2025-09-25-Delta-Lake-4.0/delta-lake-4.0-features.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
67 changes: 67 additions & 0 deletions
67
packages/delta-site/src/content/blog/2025-09-25-Delta-Lake-4.0/index.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| --- | ||
| title: Delta Lake 4.0 | ||
| description: My summary of Delta Lake 4.0. | ||
| thumbnail: ./thumbnail.png | ||
| author: Carly Akerly, Robert Pack | ||
| --- | ||
|
|
||
| # Delta Lake 4.0: The Future of Open Data Lakehouses | ||
| We’re thrilled to announce the release of Delta Lake 4.0, a milestone release packed with powerful new features, performance optimizations, and foundational enhancements for the future of open data lakehouses. With new catalog integration, enhanced support for semi-structured data, smarter change tracking, and improved performance, this release delivers practical solutions to everyday data challenges. | ||
|
|
||
| Let’s dive into the highlights! | ||
|
|
||
| # Community-Driven Release | ||
| Delta Lake 4.0 is the result of a true community effort, with contributions from an incredible group of contributors from around the world. More than 70 individuals came together to deliver the features in this release. | ||
|
|
||
| A big thank you to everyone who contributed code, ideas, feedback, and support. Your passion and commitment drive the Delta Lake project forward and help make the open data lakehouse vision a reality for organizations everywhere. | ||
|
|
||
|  | ||
|
|
||
| # Spark Integration: Laying the Groundwork for the Future | ||
|
|
||
| ## Catalog-Managed Tables (Preview) | ||
| Delta Lake 4.0 introduces catalog-managed tables as a preview feature—a major shift that enables seamless integration with catalogs. While Delta Lake continues to support its popular “filesystem-managed” tables, catalog-managed tables are the foundation for a whole new set of possibilities. | ||
|
|
||
| Expect more advanced features like enhanced observability, foreign key constraints, and multi-table / multi-statement transactions built on this foundation in future releases! | ||
|
|
||
| ## Delta Connect: Extending Spark Connect | ||
| Delta Lake 4.0 brings Delta Connect: an extension that enables Delta-specific operations over the Spark Connect wire protocol. This means: | ||
| - Delta APIs everywhere: Access Delta Lake features from any Spark Connect client, including Python variants. | ||
| - Future-proofing: Server and client can evolve independently, making upgrades and integrations smoother than ever. | ||
|
|
||
| ## Variant Data Type: Embracing Semi-Structured Data | ||
| The new variant data type is a game-changer for anyone dealing with semi-structured data (think JSON payloads with evolving schemas): | ||
| - Schema-on-read: No need to define a rigid schema up front. Extract structured data at query time. | ||
| - Efficient storage: Store dynamic payloads efficiently, with high-performance encoding. | ||
| - Shredded Variants (Preview): Extract specific columns from variant data and collect statistics for faster queries and file skipping. | ||
|
|
||
| This is a highly anticipated feature, developed with broad community support across many projects. Contributors from Apache Parquet, Apache Arrow, Apache Iceberg, Delta Lake, Apache Spark, and many more are coming together to deliver these features across the data ecosystem. | ||
|
|
||
| ## Drop Feature: Streamlined Table Evolution | ||
| Previously, removing a feature from a Delta table required truncating the entire history—a major pain point. With Delta Lake 4.0, dropping features retains table history, making it easier to evolve your tables and enable compatibility with more clients. | ||
|
|
||
| # Kernel Enhancements: Performance and Advanced Table Features | ||
|
|
||
| ## Version Checksums: Faster, More Reliable Table Reads | ||
| The Java kernel now supports reading and writing version checksums—think of them as mini-checkpoints for your Delta logs. Benefits include: | ||
| Accelerated log parsing: Quickly access the most current protocol and metadata. | ||
| Detailed metrics: File counts, table size, and data distribution histograms for stronger consistency and easier debugging. | ||
|
|
||
| ## Log Compaction: Read & Write Support | ||
| Kernel now supports reading and writing log compaction files, further optimizing table scan performance and reducing overhead for large tables. | ||
|
|
||
| ## Advanced Table Feature Support | ||
| This release contains some major improvements to the internal table features framework, which governs how table features are enabled in Delta Kernel. As part of this change, engines using Delta kernel can now write to tables with the deletionVectors, v2Checkpoint, and timestampNtz features enabled. | ||
|
|
||
| On top of this, two foundational new table features are now supported. | ||
| - Row Tracking allows tracking individual rows across inserts, updates, and deletes and enables highly efficient implementations for Change Data Feeds | ||
| - Clustered Tables support enables Kernel to define and update the clustering columns on a table, making clustering information available for Delta clustering implementations. | ||
|
|
||
| ## File Statistics: Enhanced File Skipping | ||
| Improved support for writing file statistics means even more effective file skipping—speeding up queries and reducing resource usage. | ||
|
|
||
| # Looking Ahead | ||
| Delta Lake 4.0 is a leap forward, setting the stage for even more powerful data warehouse features, greater flexibility, and blazing-fast performance. We’re excited to see what you build with it! | ||
|
|
||
| ## Ready to get started? | ||
| Check out the Delta Lake 4.0 documentation and join the conversation in our community forums. |
Binary file added
BIN
+93.3 KB
packages/delta-site/src/content/blog/2025-09-25-Delta-Lake-4.0/thumbnail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.