Skip to content

Can we avoid JVM round-trips in some cases for native_iceberg_compat scans? #3431

@andygrove

Description

@andygrove

What is the problem the feature request solves?

native_iceberg_compat reads parquet in Rust via NativeBatchReader, then imports the data to JVM via Arrow FFI, then re-exports it back to native for execution by the next operator in the plan. Although Arrow FFI is zero-copy for the data, there is still serialization overhead for the schema.

I am wondering if this round-trip can be avoided in some cases

The following profiling information is from running async-profiler.

Image

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions