Commit f6bfa7b
authored
apacheGH-39010: [Python] Introduce
### Rationale for this change
Currently, unfortunately `MapScalar`/`Array` types are not deserialized into proper Python `dict`s, which is unfortunate since this breaks "roundtrips" from Python -> Arrow -> Python:
```
import pyarrow as pa
schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))])
data = [{'x': {'a': 1}}]
pa.RecordBatch.from_pylist(data, schema=schema).to_pylist()
# [{'x': [('a', 1)]}]
```
This is especially bad when storing TiBs of deeply nested data (think of lists in structs in maps...) that were created from Python and serialized into Arrow/Parquet, since they can't be read in again with native `pyarrow` methods without doing extremely ugly and computationally costly workarounds.
### What changes are included in this PR?
A new parameter `maps_as_pydicts` is introduced to `to_pylist`, `to_pydict`, `as_py` which will allow proper roundtrips:
```
import pyarrow as pa
schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))])
data = [{'x': {'a': 1}}]
pa.RecordBatch.from_pylist(data, schema=schema).to_pylist(maps_as_pydicts="strict")
# [{'x': {'a': 1}}]
```
### Are these changes tested?
Yes. There are tests for `to_pylist` and `to_pydict` included for `pyarrow.Table`, whilst low-level `MapScalar` and especially a nesting with `ListScalar` and `StructScalar` is tested.
Also, duplicate keys now should throw an error, which is also tested for.
### Are there any user-facing changes?
No callsites should be broken, simply a new keyword-only optional parameter is added.
* GitHub Issue: apache#39010
Authored-by: Jonas Dedden <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>maps_as_pydicts parameter for to_pylist, to_pydict, as_py (apache#45471)1 parent ce012eb commit f6bfa7b
File tree
6 files changed
+473
-55
lines changed- python/pyarrow
- tests
6 files changed
+473
-55
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1651 | 1651 | | |
1652 | 1652 | | |
1653 | 1653 | | |
1654 | | - | |
| 1654 | + | |
1655 | 1655 | | |
1656 | 1656 | | |
1657 | 1657 | | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
1658 | 1672 | | |
1659 | 1673 | | |
1660 | 1674 | | |
1661 | 1675 | | |
1662 | 1676 | | |
1663 | | - | |
| 1677 | + | |
1664 | 1678 | | |
1665 | 1679 | | |
1666 | 1680 | | |
| |||
2286 | 2300 | | |
2287 | 2301 | | |
2288 | 2302 | | |
2289 | | - | |
| 2303 | + | |
2290 | 2304 | | |
2291 | 2305 | | |
2292 | 2306 | | |
2293 | 2307 | | |
2294 | 2308 | | |
| 2309 | + | |
| 2310 | + | |
| 2311 | + | |
| 2312 | + | |
| 2313 | + | |
| 2314 | + | |
2295 | 2315 | | |
2296 | 2316 | | |
2297 | 2317 | | |
| |||
0 commit comments