Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/delta-site/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"license": "ISC",
"dependencies": {
"@astrojs/netlify": "^6.5.7",
"@astrojs/rss": "^4.0.12",
"@astrojs/tailwind": "^6.0.2",
"@fontsource-variable/source-code-pro": "^5.2.6",
"@fontsource/source-sans-pro": "^5.2.5",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ publishedAt: 2021-12-03
We are excited to announce the release of [Delta Lake 1.1.0](https://github.com/delta-io/delta/releases/tag/v1.1.0) on [Apache Spark 3.2](https://spark.apache.org/releases/spark-release-3-2-0.html). Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13. The key features in this release are as follows.

- **Performance improvements in MERGE operation**

- On partitioned tables, MERGE operations will automatically [repartition the output data before writing to files](https://docs.delta.io/latest/delta-update.html#performance-tuning). This ensures better performance out-of-the-box for both the MERGE operation as well as subsequent read operations.
- On very wide tables (e.g., 1000 columns), MERGE operation can be faster since it now [avoids quadratic complexity when resolving column names](https://github.com/delta-io/delta/commit/83780aeeadd67893ad69ed6481f7c6bce5be563c) in a table with ~1000 or more columns.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ We are excited to announce the release of Delta Connectors 0.3.0, which introduc
#### Delta Standalone

- **Write functionality** - This release introduces new APIs to support creating and writing Delta tables without Apache Spark™. External processing engines can write parquet data files themselves and then use the APIs to add the files to the Delta table atomically. Following the [Delta Transaction Log Protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md), the implementation uses optimistic concurrency control to manage multiple writers, automatically generates checkpoint files, and manages log and checkpoint cleanup according to the protocol. The main Java class exposed is `OptimisticTransaction`, which is accessed via `DeltaLog.startTransaction()`.

- `OptimisticTransaction.markFilesAsRead(readPredicates)` must be used to read all metadata during the transaction (and not the `DeltaLog`). It is used to detect concurrent updates and determine if logical conflicts between this transaction and previously-committed transactions can be resolved.
- `OptimisticTransaction.commit(actions, operation, engineInfo)` is used to commit changes to the table. If a conflicting transaction has been committed first (see above) an exception is thrown, otherwise the table version that was committed is returned.
- Idempotent writes can be implemented using `OptimisticTransaction.txnVersion(appId)` to check for version increases committed by the same application.
Expand All @@ -23,7 +22,6 @@ We are excited to announce the release of Delta Connectors 0.3.0, which introduc
- **Partition filtering for metadata reads and conflict detection in writes** - This release introduces a simple expression framework for partition pruning in metadata queries. When reading files in a snapshot, filter the returned `AddFiles` on partition columns by passing a `predicate` into `Snapshot.scan(predicate)`. When updating a table during a transaction, specify which partitions were read by passing a `readPredicate` into `OptimisticTransaction.markFilesAsRead(readPredicate)` to detect logical conflicts and avoid transaction conflicts when possible.

- **Miscellaneous updates:**

- `ParquetSchemaConverter` converts a `StructType` schema to a Parquet schema.
- `Iterator<VersionLog> DeltaLog.getChanges()` exposes an incremental metadata changes API. VersionLog wraps the version number, and the list of actions in that version.
- Fix [#197](https://github.com/delta-io/connectors/pull/197) for `RowRecord` so that values in partition columns can be read.
Expand Down
1 change: 0 additions & 1 deletion packages/delta-site/src/content/blog/delta-kernel/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,6 @@ Let's explore why designing Delta Kernel posed a unique set of challenges.
- **Challenge 2: Support for distributed engines with the right level of abstraction -** If an engine wants to read data in a single thread or process, then Delta Kernel simply needs to return table data when given a Path. However, distributed engines like Apache Spark or Trino, or DuckDB's multi-threaded parquet reader, operate differently: they require the table's metadata (schema, list of files, etc.) in order to plan out the tasks of the distributed jobs. These tasks, running on different machines or threads, read different parts of the table data based on the parts of the metadata they were given.

To support distributed engines, Delta Kernel must provide the right level of abstraction. It needs to:

1. Expose just enough "abstract metadata" that an engine can use for planning without having to understand protocol details.
2. Allow reading of table data with the necessary abstract metadata.

Expand Down
2 changes: 0 additions & 2 deletions packages/delta-site/src/content/blog/delta-lake-3-3/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,11 @@ for type widening.
These are just a few highlights. Additional features include:

- Liquid Clustering convenience and performance features:

- [OPTIMIZE FULL](https://github.com/delta-io/delta/pull/3793): Recluster all records in a table for peak performance.
- [Unpartitioned Tables](https://github.com/delta-io/delta/pull/3174): Enable clustering on an existing, unpartitioned table.
- [External Locations](https://github.com/delta-io/delta/pull/3251): Create clustered tables from external storage.

- UniForm performance optimizations and functionality:

- Support for [timestamp-type partition columns](https://github.com/delta-io/delta/commit/7a0db43df1ef8236e4db8a57837734b83ed15153) for UniForm Iceberg
- Automatic manifest cleanups via [expireSnapshot](https://github.com/delta-io/delta/commit/7bb979205d7eb4cd8aaa04da8fd960f3862b53b7) whenever OPTIMIZE is run on the Delta table
- [List and map](https://github.com/delta-io/delta/commit/dd39415912f6009fb9e5d2f4057288bb1e9fd117) data types for UniForm Hudi
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,10 @@ If you're not yet familiar with StarRocks, here's what you need to know. Designe
architecture optimized for running customer-facing workloads on open data lakehouses, StarRocks' core components include:

- **Frontend (FE)**:

- Responsible for query parsing, plan generation, and metadata management.
It acts as the brain, orchestrating query execution across multiple nodes.

- **Compute Node (CN):**

- Handles data caching, retrieval, and execution of distributed query plans.
It is the muscle that delivers fast and scalable data processing.

Expand Down Expand Up @@ -137,11 +135,9 @@ minimize metadata-related overhead in the FE, ensuring responsiveness for high-c
Today, StarRocks’ Delta Lake connector supports:

1. **Wide Data Type Coverage**:

1. Handles core data types like INT, STRING, and FLOAT, with ongoing development for complex types like MAP and STRUCT.

2. **Data Skipping**:

1. Efficiently skips irrelevant data based on Parquet file statistics and Delta transaction logs, drastically reducing scan times.

3. **Advanced Table Features**:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,13 +96,11 @@ While not a user-facing feature in itself, it is a revolutionary internal mechan
Delta on Spark already supports creating, writing, and reading tables with the above features, and support for engines is coming via the Delta Kernel project (to be discussed further in part 2 of this blog post). In addition, there have been major improvements in the user interfaces and operations in the last 12 months.

- **Most advanced MERGE API ever** - Since the inception of [MERGE support in Delta 0.3](https://github.com/delta-io/delta/releases/tag/v0.3.0) in 2019, we have always pushed the boundary of MERGE capabilities. We were the first open-source format to introduce support for the following:

- Automatic schema evolution with `INSERT *` and `UPDATE SET * `([Delta 0.6](https://github.com/delta-io/delta/releases/tag/v0.6.0))
- Unlimited conditional `WHEN MATCHED` and `WHEN NOT MATCHED` clauses ([Delta 1.0](https://github.com/delta-io/delta/releases/tag/v1.0.0))
- Complex types support ([Delta 0.8](https://github.com/delta-io/delta/releases/tag/v0.8.0) and [Delta 1.1](https://github.com/delta-io/delta/releases/tag/v1.1.0))

- Over the last year, we have been improving APIs and improving performance

- [WHEN NOT MATCHED BY SOURCE clause](https://docs.delta.io/2.3.0/delta-update.html#modify-all-unmatched-rows-using-merge) (Scala in [Delta 2.3](https://github.com/delta-io/delta/releases/tag/v2.3.0) and SQL in [Delta 2.4](https://github.com/delta-io/delta/releases/tag/v2.4.0)), which allows you to perform more complicated data updates in a single MERGE operation (instead of multiple `UPDATE/DELETE/MERGE` operations).
- Arbitrary column support in automatic schema evolution ([Delta 2.3](https://github.com/delta-io/delta/releases/tag/v2.3.0))
- [Idempotency in MERGE ](https://docs.delta.io/2.3.0/delta-batch.html#idempotent-writes)for ensuring failures in periodic jobs are handled gracefully ([Delta 2.3](https://github.com/delta-io/delta/releases/tag/v2.3.0))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ While we are racing to add all the existing protocol features support to this gr
- **[Trino](https://trino.io/docs/current/connector/delta-lake.html)** - The Delta Trino connector now supports Deletion Vectors, Column Mapping, and other key features from the main Delta Lake spec. It also saw performance improvements across the board.
- **[Apache Druid](https://druid.apache.org/docs/latest/development/extensions-contrib/delta-lake/)** - [Apache Druid 29 has added support for Delta Lake](https://druid.apache.org/docs/latest/release-info/release-notes/#druid-2900) using [Delta Kernel](https://github.com/apache/druid/blob/master/docs/development/extensions-contrib/delta-lake.md#version-support).
- **[Delta Rust](https://github.com/delta-io/delta-rs)** **(delta-rs crate / deltalake PyPI)** - This immensely popular project ([2M+ PyPI downloads/month as of April 3, 2024](https://pypistats.org/packages/deltalake)) has added many API improvements:

- Support for popular operations - `DELETE, UPDATE, MERGE, OPTIMIZE ZORDER, CONVERT TO DELTA`
- Support for table constraints - writes will ensure data constraints defined in the table will not be violated
- Support for schema evolution
Expand Down
12 changes: 12 additions & 0 deletions packages/delta-site/src/pages/blog/[...page].astro
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import Section from "delta-theme/components/Section.astro";
import type { GetStaticPaths, Page } from "astro";
import type { CollectionEntry } from "astro:content";
import Pagination from "delta-theme/components/Pagination.astro";
import Icon from "delta-theme/components/Icon.astro";
import PostsList from "../../components/PostsList.astro";
import Layout from "../../layouts/Layout.astro";

Expand Down Expand Up @@ -39,6 +40,17 @@ latestPosts.sort(sortBlogs);
padding="xxl"
className="bg-white"
>
<div class="flex justify-between items-center mb-8">
<div></div>
<a
href="/rss.xml"
class="inline-flex items-center gap-2 text-sm text-link transition-colors"
title="Subscribe to RSS feed"
>
<Icon icon="rss" alt="RSS Feed" className="w-4 h-4" />
RSS Feed
</a>
</div>
{
page.currentPage === 1 && (
<PostsList posts={latestPosts.slice(0, 1)} hasFeaturedItem />
Expand Down
31 changes: 31 additions & 0 deletions packages/delta-site/src/pages/rss.xml.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import rss from "@astrojs/rss";
import { getCollection } from "astro:content";

export async function GET(context) {
const posts = await getCollection("blog");

// Sort posts by publishedAt date (newest first)
posts.sort(
(a, b) => b.data.publishedAt.getTime() - a.data.publishedAt.getTime(),
);

// Filter out posts that don't have required fields
const validPosts = posts.filter((post) => {
return post.data.title && post.data.description && post.data.publishedAt;
});

return rss({
title: "Delta Lake Blog",
description:
"Latest news, updates, and insights from the Delta Lake project",
site: context.site,
items: validPosts.map((post) => ({
title: post.data.title,
description: post.data.description,
pubDate: post.data.publishedAt,
link: `/blog/${post.id}/`,
...(post.data.updatedAt && { updatedDate: post.data.updatedAt }),
})),
customData: `<language>en-us</language>`,
});
}
5 changes: 5 additions & 0 deletions packages/delta-theme/src/components/Icon.astro
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@ const iconPaths = {
chevronLeft: [
"M4.94063 10.9406C4.35469 11.5266 4.35469 12.4781 4.94063 13.0641L13.9406 22.0641C14.5266 22.65 15.4781 22.65 16.0641 22.0641C16.65 21.4781 16.65 20.5266 16.0641 19.9406L8.12344 12L16.0594 4.05937C16.6453 3.47344 16.6453 2.52187 16.0594 1.93594C15.4734 1.35 14.5219 1.35 13.9359 1.93594L4.93594 10.9359L4.94063 10.9406Z",
],
rss: [
"M5 3a1 1 0 000 2c5.523 0 10 4.477 10 10a1 1 0 102 0C17 8.373 11.627 3 5 3z",
"M4 9a1 1 0 011-1 7 7 0 017 7 1 1 0 11-2 0 5 5 0 00-5-5 1 1 0 01-1-1z",
"M3 15a2 2 0 114 0 2 2 0 01-4 0z",
],
};

interface Props {
Expand Down
6 changes: 6 additions & 0 deletions packages/delta-theme/src/components/PageLayout.astro
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ const shouldExcludeFromSearch = excludedPaths.some(
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<link rel="canonical" href={new URL(Astro.url.pathname, Astro.site)} />
<link rel="sitemap" href="/sitemap-index.xml" />
<link
rel="alternate"
type="application/rss+xml"
title="Delta Lake Blog"
href="/rss.xml"
/>
<meta name="generator" content={Astro.generator} />
<title>{title ? `${title} | ${config.title}` : config.title}</title>
{description && <meta name="description" content={description} />}
Expand Down
24 changes: 24 additions & 0 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.