[Iceberg v3] Deletion vector #27788

dain · 2025-12-30T08:23:36Z

Description

This PR adds support for Iceberg format v3 deletion vectors in the Trino Iceberg connector.

Read path:

Recognizes v3 deletion vectors stored as Puffin deletion-vector-v1 blobs referenced from delete file entries.
Applies deletion vectors during reads using the row position column, integrating with the existing delete filtering pipeline.

Write path:

Enables row-level operations (DELETE/UPDATE/MERGE) for v3 tables by writing deletion vectors rather than legacy position delete files.
Supports v2 → v3 upgrade scenarios by merging existing v2 position delete files into deletion vectors once the table is upgraded.
Maintains a single deletion vector per data file and prefers deletion vectors over position delete files when present.

Testing:

Adds coverage for writing/reading deletion vectors in v3 tables (including multiple Puffin files mid-stream and convergence).
Adds coverage for v2 tables with deletes upgraded to v3, ensuring existing deletes remain effective and new deletes use deletion vectors.
Updates v3 “updates blocked” tests (DELETE/UPDATE/MERGE) to validate the operations now succeed.

Release notes

(X) Release notes are required, with the following suggested text:

## Iceberg
* Add support for Iceberg format v3 deletion vectors to enable DELETE/UPDATE/MERGE on v3 tables.

findepi · 2025-12-30T13:40:38Z

Are the build failures in iceberg related?

dain · 2025-12-30T19:13:10Z

Are the build failures in iceberg related?

Yes. I have a tivial mistake in here. fixing :D

ebyhr

The implementation is still broken. DefaultDeletionVectorWriter will throw duplicate key error if you run TestIcebergParquetConnectorTest with v3.

findepi · 2025-12-31T10:25:48Z

if you run TestIcebergParquetConnectorTest with v3.

iceberg module test is green. Do we miss some test coverage?
@ebyhr what would you recommend being tested?

dain · 2025-12-31T21:05:39Z

I patched the problem. The core issue is in some cases you get duplicate delete entries for the same file. The fix was trivial, but I'm not sure how you would test this reliabily at scale to trigger it. Maybe we should add a copy of the larger smoke test that runs on v3 with the ~35 tests that call optimize disabled. In the long run I expect we may want a hard coded v2 test as everything moves to v3

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/PositionDeleteFilter.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

raunaqmorarka · 2026-01-03T06:04:12Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

+    public static final class Builder
+    {
+        // key = (int) (pos >>> 32), value bitmap contains (int) pos low bits
+        private final Int2ObjectMap<RoaringBitmap> deletedRows = new Int2ObjectOpenHashMap<>();


Can we avoid using the Map by using similar logic as org.apache.iceberg.deletes.RoaringPositionBitmap#set ?

I rewrote this a bunch of times. I don't think it matters either way at this point, but I'll take a look at going back to an array here also. BTW, I think the iceberg version is a fork of the roaring bitmap long bitmap code.

JFYI there is io.trino.plugin.deltalake.delete.RoaringBitmapArray in delta for a similar use case of storing position deletes in 32-bit roaring bitmaps (better efficiency than 64-bit roaring bitmaps) while still allowing a large positions range that is big enough for practical purposes.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

raunaqmorarka · 2026-01-03T06:21:25Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

+    {
+        for (int key = 0; key < deletedRows.length; key++) {
+            RoaringBitmap bitmap = deletedRows[key];
+            if (bitmap != null && !bitmap.isEmpty()) {


I'm a bit confused to see this check in multiple places, how would we end up with null or empty bitmap ?
Could we just eliminate those once in constructor ?

The array position is null for any section that does not have a deletion. This avoids having to fill the array with empty bitmaps.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteManager.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVectorWriter.java

.../trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DefaultDeletionVectorWriter.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

chenjian2664 · 2026-01-12T10:16:52Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java

+    }
+
+    @Test
+    void testV2ToV3MigrationWithDeletes()


Could you also please add a case for the table existing equality deletes?

I don't think we need this. equality deletes and positions delets have always been separate independent systems. I don't think we don't need to write a complex test to show that.

dain · 2026-01-13T19:14:11Z

I responded to or applied all comments.

raunaqmorarka · 2026-01-14T13:10:04Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                })
+                .toList();
+
+        deletionVectorWriter.writeDeletionVectors(session, icebergTable, table, deletionVectorInfos, rowDelta);


@dain could you clarify why we're writing the deletion vectors from the coordinator ?
I think this can incur reads of previous deletes and require significant resources.
Can the "single deletion vector per data file" requirement not be met if we write this from the worker nodes ?
cc: @chenjian2664 @ebyhr

I don't think it can be met. I tested this idea before by running the test suite with a requirement that we only get one DV on the coordinator pre data file, and I ended up with multiple DVs. That said, I think this is the right approach. The DVs are quite small, even for large deletes. They must be combined into a single DV per datafile, along with the any preexisting DVs (or position delete files).

This PR works by transporting the DVs from the workers to the coordinator via the fragments. It is possible that we may want to transport these via storage, but the latency cost would be quite high. I condiered this and decided we should wait for production feedback.

Another, thing we should consider is the cost during switch over from v2 position deletes, to v3 DV. This will cause some more load on coordinators during the transitions. IMO the best mitigation here is to add an optimize deletes table prodcedure. I also think we should document that as the preferred approach.

Could you share in what situations you have encountered multiple deletion vectors for the same data file?

If all delete files are available to the worker while writing the deletion vector, we could merge them into a single DV. Does that approach make sense to you?

Sure. Imagine you are deletting from a table that has only one file. The delete is written as delete any row that exists in another table. The other table is larger so you get a distributed join. The rows from that one file will be distributed to every machine.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

chenjian2664 · 2026-01-15T00:32:24Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

+
+        public Optional<DeletionVector> build()
+        {
+            if (Arrays.stream(deletedRows).allMatch(bitmap -> bitmap == null || bitmap.isEmpty())) {


How about add a bitmapCount, so we can use it check directly and we can maintain it in the method getOrCreateBitmap and get it in deserialize, also will simplify the serialize method

I skipped this one. I thinkt he current design is good enoug

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

chenjian2664 · 2026-01-15T00:44:21Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

+
+    private static IntConsumer intToLongAdapter(int keyHigh, LongConsumer consumer)
+    {
+        return keyLow -> consumer.accept(((long) keyHigh << 32) | (keyLow & 0xFFFFFFFFL));


nit: ((long) keyHigh << 32) -> (((long) keyHigh) << 32)
make intent obvious

chenjian2664 · 2026-01-15T00:47:33Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

+
+        public Builder addAll(Builder other)
+        {
+            for (int i = 0; i < other.deletedRows.length; i++) {


since the key is ordered, we could do it in a reverse way so we don't needs to expands the array in getOrCreateBitmap multi times

chenjian2664 · 2026-01-15T00:49:35Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

+
+        public Builder addAll(DeletionVector deletionVector)
+        {
+            for (int i = 0; i < deletionVector.deletedRows.length; i++) {


nit: same as blow, we could do it in a reverse way

.../trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DefaultDeletionVectorWriter.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/PositionDeleteReader.java

.../trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DefaultDeletionVectorWriter.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteManager.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/delete/TestDeletionVector.java

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

ebyhr · 2026-01-16T05:26:59Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

 import io.airlift.json.JsonCodec;
 import io.airlift.log.Logger;
 import io.airlift.slice.Slice;
+import io.airlift.slice.Slices;


When do we update OPTIMIZE_MAX_SUPPORTED_TABLE_VERSION and CLEANING_UP_PROCEDURES_MAX_SUPPORTED_TABLE_VERSION to 3? After merging row lineage PR?

Yes, it is in the row lineage PR.

.../trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DefaultDeletionVectorWriter.java

Refactors the existing v2 position delete code to use DeletionVector instead of directly using RoaringBitmap. This simplifies the code and prepares for v3 deletion vector support.

findinpath · 2026-01-16T14:54:42Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                            toPartitionData(partitionSpec, schema, task.partitionDataJson()));
+                })
+                .toList();
+


@dain

The outcome of this PR is that have a single puffin file per snapshot that contains blobs for all deletes in the snapshot.

Would it make senses to write temporarily smaller delete files on the workers and consolidate them into one (if needed) on the coordinator side (and delete the small DV files) in IcebergMetadata.finishWrite ?

This approach would solve the concerns of having small delete files scattered in the metadata and would still keep the functionality that this PR provides at the expense of potentially doing additional temporary writes on the storage.
Also concerns of putting memory pressure on the coordinator would addressed as well.

dain requested review from ebyhr, electrum and raunaqmorarka December 30, 2025 08:23

cla-bot bot added the cla-signed label Dec 30, 2025

github-actions bot added the iceberg Iceberg connector label Dec 30, 2025

raunaqmorarka requested review from chenjian2664 and findinpath December 30, 2025 08:49

dain force-pushed the deletion-vector branch from e9c66e6 to 24fda38 Compare December 31, 2025 07:20

ebyhr reviewed Dec 31, 2025

View reviewed changes

dain force-pushed the deletion-vector branch from 24fda38 to 4b76aa3 Compare December 31, 2025 21:00

github-actions bot added the lakehouse label Dec 31, 2025

raunaqmorarka reviewed Jan 2, 2026

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/PositionDeleteFilter.java Show resolved Hide resolved

raunaqmorarka reviewed Jan 2, 2026

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVector.java Show resolved Hide resolved

dain force-pushed the deletion-vector branch 2 times, most recently from ee3a9d0 to d6f077e Compare January 2, 2026 22:27

raunaqmorarka reviewed Jan 3, 2026

View reviewed changes

dain force-pushed the deletion-vector branch 6 times, most recently from d562376 to fa2c669 Compare January 11, 2026 02:13

dain changed the title ~~Deletion vector~~ [Iceberg v3] Deletion vector Jan 11, 2026

raunaqmorarka reviewed Jan 11, 2026

View reviewed changes

chenjian2664 reviewed Jan 12, 2026

View reviewed changes

dain force-pushed the deletion-vector branch from fa2c669 to 9e1f3dd Compare January 13, 2026 07:50

dain requested a review from raunaqmorarka January 13, 2026 19:14

raunaqmorarka reviewed Jan 14, 2026

View reviewed changes

chenjian2664 reviewed Jan 15, 2026

View reviewed changes

electrum approved these changes Jan 16, 2026

View reviewed changes

chenjian2664 reviewed Jan 16, 2026

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/delete/TestDeletionVector.java Show resolved Hide resolved

ebyhr reviewed Jan 16, 2026

View reviewed changes

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java Show resolved Hide resolved

ebyhr reviewed Jan 16, 2026

View reviewed changes

.../trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DefaultDeletionVectorWriter.java Show resolved Hide resolved

dain force-pushed the deletion-vector branch 2 times, most recently from b37b736 to 68da3c0 Compare January 16, 2026 05:52

dain added 3 commits January 15, 2026 22:29

Add DeletionVector class for tracking deleted row positions

1b6cf73

Refactor position delete handling to use DeletionVector

e4f6aa5

Refactors the existing v2 position delete code to use DeletionVector instead of directly using RoaringBitmap. This simplifies the code and prepares for v3 deletion vector support.

Add support for Iceberg v3 deletion vectors

e236785

dain force-pushed the deletion-vector branch from 68da3c0 to e236785 Compare January 16, 2026 06:44

dain merged commit 0eb94ac into trinodb:master Jan 16, 2026
54 of 56 checks passed

dain deleted the deletion-vector branch January 16, 2026 07:34

github-actions bot added this to the 480 milestone Jan 16, 2026

chenjian2664 mentioned this pull request Jan 16, 2026

Minor updates for DeleteVector #27938

Merged

findinpath reviewed Jan 16, 2026

View reviewed changes

chenjian2664 mentioned this pull request Jan 20, 2026

Add 480 release notes #27719

Open

[Iceberg v3] Deletion vector #27788

[Iceberg v3] Deletion vector #27788

Uh oh!

Conversation

dain commented Dec 30, 2025

Description

Release notes

Uh oh!

findepi commented Dec 30, 2025

Uh oh!

dain commented Dec 30, 2025

Uh oh!

ebyhr left a comment

Choose a reason for hiding this comment

Uh oh!

findepi commented Dec 31, 2025

Uh oh!

dain commented Dec 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dain Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dain commented Jan 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenjian2664 Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dain Jan 13, 2026 •

edited

Loading

chenjian2664 Jan 15, 2026 •

edited

Loading