Skip to content

API for feparate planning and execution#75

Draft
vustef wants to merge 18 commits intomainfrom
feature/split-scan-api
Draft

API for feparate planning and execution#75
vustef wants to merge 18 commits intomainfrom
feature/split-scan-api

Conversation

@vustef
Copy link
Collaborator

@vustef vustef commented Mar 20, 2026

No description provided.

vustef and others added 13 commits March 20, 2026 18:22
Add wrapper types for task streams, individual tasks, and ArrowReader
context. Update scan structs to store file_io/batch_size/concurrency
for use by create_reader. Refactor macros to field-agnostic pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds plan_files, create_reader, next_append/delete_task, and
read_append/delete_task. Each read clones the ArrowReader to share
the CachingDeleteFileLoader cache across concurrent consumers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds three synchronous FFI functions (iceberg_file_scan_task_data_file_path,
iceberg_append_task_data_file_path, iceberg_delete_task_data_file_path) that
return newly allocated C strings. Julia API exposes task_data_file_path()
with multiple dispatch on FileScanTaskHandle, AppendTaskHandle, and
DeleteTaskHandle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds FFI functions to get file path from FileScanTask, AppendTask, and
DeleteTask. Julia functions use distinct names since handle types are
Ptr{Cvoid} aliases and can't dispatch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace type aliases (const X = Ptr{Cvoid}) with proper wrapper structs
for all FFI handle types (ArrowReaderContext, FileScanTaskStream,
FileScanTaskHandle, etc.). This enables Julia multiple dispatch for
functions like task_data_file_path, free_task, and free_task_stream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add next_tasks/read_tasks (full scan), next_append_tasks/read_append_tasks,
and next_delete_tasks/read_delete_tasks (incremental) batch APIs. These
allow pulling multiple tasks at once (count=0 drains all) and reading
them into a single Arrow stream, reducing FFI round-trips.

Rust: IcebergTaskBatchResponse (generic for all task types),
iceberg_free_task_ptr_array, plus 6 new FFI functions.

Julia: TaskBatchResponse struct, free_task_ptr_array, plus 6 new
functions with proper GC.@preserve on pointer arrays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add record_count accessors for FileScanTaskHandle, AppendTaskHandle,
and DeleteTaskHandle. Returns Union{Int64, Nothing} — nothing when
the record count is unavailable (e.g. partial file reads).

Rust: 3 new FFI functions returning i64 (-1 sentinel for None).
Julia: task_record_count with multiple dispatch on wrapper structs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

let mut builder = ArrowReaderBuilder::new(file_io.clone())
.with_row_group_filtering_enabled(true)
.with_row_selection_enabled(false);
Copy link
Collaborator Author

@vustef vustef Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why false? Should we have different paths for incremental vs full scan here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there's iceberg_create_incremental_reader

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude also mentioned this:

let op = Operator::new(builder)?.finish();

    // Use HTTP/1.1 instead of HTTP/2 for better throughput with many concurrent streams.
    // HTTP/2 multiplexes all requests over a single TCP connection, which caps bandwidth.
    // HTTP/1.1 opens separate TCP connections per request, allowing full NIC utilization.
    op.update_http_client(|_| {
        let client = reqwest::ClientBuilder::new()
            .http1_only()
            .build()
            .expect("failed to build http1-only reqwest client");
        opendal::raw::HttpClient::with(client)
    });

in iceberg-rust repo in s3.rs. I haven't noticed this helps though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant