Skip to content

Provide differert read interface for reader #1047

Open
@ZENOTME

Description

@ZENOTME

Is your feature request related to a problem or challenge?

For now, our arrow reader accepts the FileScanTask and returns the RecordBatchStream to the user. After #630, the reader can process the delete file and merge it with the data file, which it's good to ready to use out of the box. However, for some compute engines, they hope to process delete file by themselves so that they can utilize the existing join executor and storage to spill the data. This require to read the delete file directly rather than process the delete file internally.

Based on this, I suggest providing different read interface so that it satisfy different requirement:

  • read: process data and delete file of FileScanTask internally
  • read_data: read data file of FileScanTask internally
  • read_pos_delete: read position delete file of FileScanTask and return result directly
  • read_eq_delete: read equality delete file of FileScanTask and return result directly

Describe the solution you'd like

No response

Willingness to contribute

  • I can contribute to this feature independently
  • I would be willing to contribute to this feature with guidance from the Iceberg Rust community
  • I cannot contribute to this feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions