Skip to content

Iceberg core code#1836

Open
unidevel wants to merge 1 commit intoibm-iceberg-basefrom
ibm-iceberg-new
Open

Iceberg core code#1836
unidevel wants to merge 1 commit intoibm-iceberg-basefrom
ibm-iceberg-new

Conversation

@unidevel
Copy link
Copy Markdown
Collaborator

@unidevel unidevel commented Mar 23, 2026

Fix the rebase issue in PR 425

After resolved conflicts, please run the commands to test:

#!/bin/sh
export EXTRA_CMAKE_FLAGS=" \
  -DVELOX_BUILD_TESTING=ON \
  -DVELOX_ENABLE_GEO=ON \
  -DVELOX_ENABLE_PARQUET=ON \
  -DVELOX_MONO_LIBRARY=OFF \
  -DVELOX_ENABLE_SPARK_FUNCTIONS=ON \
  -DCMAKE_EXPORT_COMPILE_COMMANDS=1"
make clean
cmake  -B "_build/release"
cmake --build _build/release --target velox_hive_iceberg_insert_test velox_hive_iceberg_test velox_hive_connector_test -j16
ln -sf velox/connectors/hive/iceberg/tests/examples .
./_build/release/velox/connectors/hive/iceberg/tests/velox_hive_iceberg_insert_test
./_build/release/velox/connectors/hive/iceberg/tests/velox_hive_iceberg_test
./_build/release/velox/connectors/hive/tests/velox_hive_connector_test

@unidevel unidevel requested a review from majetideepak as a code owner March 23, 2026 22:12
@unidevel
Copy link
Copy Markdown
Collaborator Author

alchemy rebase @2026-03-23T21:37:29Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Invalid alchemy verb: rebase

@unidevel
Copy link
Copy Markdown
Collaborator Author

alchemy merge @2026-03-23T21:37:29Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

alchemy link 4f10953 @2026-03-23T21:37:29Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Added new rebase item:

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Failed to cherry-pick commit 4f10953ed72ae8faceb01374b77531d2612e493b in rebase request #1841:

exit status 1
error: could not apply 4f10953ed... Iceberg core code

Auto-merging velox/connectors/hive/iceberg/CMakeLists.txt
Auto-merging velox/connectors/hive/iceberg/IcebergParquetStatsCollector.h
CONFLICT (content): Merge conflict in velox/connectors/hive/iceberg/IcebergParquetStatsCollector.h

Please:

  1. Rebase your branch with staging/staging-e154cab1a-rebase and fix the conflict. If the rebase item is a PR, you can change the base branch to this staging branch.
  2. Comment on this issue:
alchemy link [updated comma-separated commit SHAs for this issue] @2026-03-24T15:53:57Z
  1. Re-open Rebase branch staging-e154cab1a-rebase with staging-e154cab1a-head (e154cab) #1841 to retry the cherry-pick.

@unidevel
Copy link
Copy Markdown
Collaborator Author

alchemy link 83bb26744 @2026-03-24T15:53:57Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-03-24T15:53:57Z by @unidevel via #1836 (comment):

Added new rebase item:

@prestodb-ci
Copy link
Copy Markdown
Collaborator

Failed to cherry-pick commit 83bb26744 in rebase request #1850:

exit status 1
error: could not apply 83bb26744... Iceberg core code

Auto-merging velox/docs/configs.rst
Auto-merging velox/dwio/common/Options.h
Auto-merging velox/dwio/parquet/tests/writer/ParquetWriterTest.cpp
Auto-merging velox/dwio/parquet/writer/Writer.cpp
CONFLICT (content): Merge conflict in velox/dwio/parquet/writer/Writer.cpp
Auto-merging velox/dwio/parquet/writer/Writer.h

Please:

  1. Rebase your branch with staging/staging-ad018364c-rebase and fix the conflict. If the rebase item is a PR, you can change the base branch to this staging branch.
  2. Comment on this issue:
alchemy link [updated comma-separated commit SHAs for this issue] @2026-03-26T07:55:58Z
  1. Re-open Rebase branch staging-ad018364c-rebase with staging-ad018364c-head (ad01836) #1850 to retry the cherry-pick.

This was referenced Apr 11, 2026
Co-authored-by: Li Zhou <unidevel@hotmail.com>

Alchemy-item: (ID = 1153) Iceberg staging hub commit 1/6 - c5a69de3d1021073c13a99e1c7c6d6fcce355178

refactor: Move toValues from InPredicate.cpp to Filter.h

The function toValues removes duplicated values from the vector and
return them in a std::vector. It was used to build an InPredicate. It
will be needed for building NOT IN filters for Iceberg equality delete
read as well, therefore moving it from velox/functions/prestosql/InPred
icate.cpp to velox/type/Filter.h. This commit also renames it to
deDuplicateValues to make it easier to understand.

feat(connector): Support reading Iceberg split with equality deletes

This commit introduces EqualityDeleteFileReader, which is used to read
Iceberg splits with equality delete files. The equality delete files
are read to construct domain filters or filter functions, which then
would be evaluated in the base file readers.

When there is only one equality delete field, and when that field is
an Iceberg identifier field, i.e. non-floating point primitive types,
the values would be converted to a list as a NOT IN domain filter,
with the NULL treated separately. This domain filter would then be
pushed to the ColumnReaders to filter our unwanted rows before they
are read into Velox vectors. When the equality delete column is a
nested column, e.g. a sub-column in a struct, or the key in a map,
such column may not be in the base file ScanSpec. We need to add/remove
these subfields to/from the SchemaWithId and ScanSpec recursively if
they were not in the ScanSpec already. A test is also added for such
case.

If there are more than one equality delete field, or the field is not
an Iceberg identifier field, the values would be converted to a typed
expression in the conjunct of disconjunts form. This expression would
be evaluated as the remaining filter function after the rows are read
into the Velox vectors. Note that this only works for Presto now as
the "neq" function is not registered by Spark. See https://github.com/
facebookincubator/issues/12667

Note that this commit only supports integral types. VARCHAR and
VARBINARY need to be supported in future commits (see
facebookincubator#12664).

Co-authored-by: Naveen Kumar Mahadevuni <Naveen.Mahadevuni@ibm.com>

Alchemy-item: (ID = 1153) Iceberg staging hub commit 2/6 - 14edb98c67f1c572a5f40682923795bd5b08e7c3

Support insert data into iceberg table.

Add iceberg partition transforms.

Co-authored-by: Chengcheng Jin <Chengcheng.Jin@ibm.com>

Add NaN statistics to parquet writer.

Collect Iceberg data file statistics in dwio.

Integrate Iceberg data file statistics and adding unit test.

Support write field_id to parquet metadata SchemaElement.

Implement iceberg sort order

Add clustered Iceberg writer mode.

Fix parquet writer ut

Add IcebergConnector

Fix unittest error

Resolve confict

Resolve confict

Fix test build issue

Fix crash

test(Iceberg): Add equality delete tests
Co-authored-by: Naveen Kumar Mahadevuni <Naveen.Mahadevuni@ibm.com>

Fix stats collection for integer based decimal numbers
Co-authored-by: mohsaka <michael.ohsaka@ibm.com>

Fix configureReaderOptions in EqualityDeleteFileReader
Co-authored-by: Christian Zentgraf <kitgocz@gmail.com>
@mohsaka
Copy link
Copy Markdown
Collaborator

mohsaka commented Apr 16, 2026

alchemy merge @2026-04-16T02:34:00Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

alchemy link 0e0bf94 @2026-04-16T02:34:00Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-04-16T02:34:00Z by @prestodb-ci via #1836 (comment):

Added new rebase item:

@mohsaka
Copy link
Copy Markdown
Collaborator

mohsaka commented Apr 16, 2026

alchemy merge @2026-04-16T02:38:00Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

alchemy link 07fdcf4 @2026-04-16T02:38:00Z

@prestodb-ci
Copy link
Copy Markdown
Collaborator

The following unexpired item was removed at 2026-04-16T02:38:00Z by @prestodb-ci via #1836 (comment):

Added new rebase item:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants