Skip to content

Expose simplified Arrow storage for Python stream export #869

@camden-lowrance

Description

@camden-lowrance

Some Python callers consume SedonaDB results through the Arrow stream interface instead of collecting a full table with to_pandas() / to_arrow_table(). This is useful for large results where callers need to process record batches incrementally.

Today, the public stream path preserves Arrow storage types. That is efficient, but it means callers may receive newer storage such as BinaryView or Utf8View. Callers that need simpler Arrow storage can work around this in SQL with sd_simplifystorage(...), but this requires them to know which columns need simplification.

SedonaDB already has an internal path that can simplify all columns while exporting a stream:

self._impl.to_stream(self._ctx, simplify=True)

It would be useful to expose this capability through the Python API so streaming callers can request simplified Arrow storage at the export boundary, without adding column-specific SQL casts or sd_simplifystorage(...) calls.

Possible API shape, naming TBD:

reader = df.to_record_batch_reader(simplify=True)

or another public streaming method that maps to the existing simplified stream behavior.

Related context: #864 and #868.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions