Some Python callers consume SedonaDB results through the Arrow stream interface instead of collecting a full table with to_pandas() / to_arrow_table(). This is useful for large results where callers need to process record batches incrementally.
Today, the public stream path preserves Arrow storage types. That is efficient, but it means callers may receive newer storage such as BinaryView or Utf8View. Callers that need simpler Arrow storage can work around this in SQL with sd_simplifystorage(...), but this requires them to know which columns need simplification.
SedonaDB already has an internal path that can simplify all columns while exporting a stream:
self._impl.to_stream(self._ctx, simplify=True)
It would be useful to expose this capability through the Python API so streaming callers can request simplified Arrow storage at the export boundary, without adding column-specific SQL casts or sd_simplifystorage(...) calls.
Possible API shape, naming TBD:
reader = df.to_record_batch_reader(simplify=True)
or another public streaming method that maps to the existing simplified stream behavior.
Related context: #864 and #868.
Some Python callers consume SedonaDB results through the Arrow stream interface instead of collecting a full table with
to_pandas()/to_arrow_table(). This is useful for large results where callers need to process record batches incrementally.Today, the public stream path preserves Arrow storage types. That is efficient, but it means callers may receive newer storage such as
BinaryVieworUtf8View. Callers that need simpler Arrow storage can work around this in SQL withsd_simplifystorage(...), but this requires them to know which columns need simplification.SedonaDB already has an internal path that can simplify all columns while exporting a stream:
It would be useful to expose this capability through the Python API so streaming callers can request simplified Arrow storage at the export boundary, without adding column-specific SQL casts or
sd_simplifystorage(...)calls.Possible API shape, naming TBD:
or another public streaming method that maps to the existing simplified stream behavior.
Related context: #864 and #868.