Skip to content

BUG: fix read_csv pyarrow engine column-name and dtype handling#65859

Draft
jbrockmendel wants to merge 3 commits into
pandas-dev:mainfrom
jbrockmendel:tst-pyarrow-csv
Draft

BUG: fix read_csv pyarrow engine column-name and dtype handling#65859
jbrockmendel wants to merge 3 commits into
pandas-dev:mainfrom
jbrockmendel:tst-pyarrow-csv

Conversation

@jbrockmendel

Copy link
Copy Markdown
Member

Audit of the @xfail_pyarrow tests in pandas/tests/io/parser found ~30 cases caused by a handful of ArrowParserWrapper bugs rather than pyarrow limitations. This fixes them and un-xfails the affected tests:

One parametrization stays xfailed with a narrower conditional mark: test_dtype_all_columns[object], where scalar dtype=object is also applied to the index column (the str variant passes).

No user-facing behavior changes for the other engines; the test diff is marker removals only.

  • Tests added and passed
  • All code checks passed

🤖 Generated with Claude Code

Brings the pyarrow engine's header/dtype handling in line with the
other engines:

- duplicated column names are now mangled to "x.1"-style names,
  mirroring the algorithm in pandas._libs.parsers.TextReader
  (including dtype-key propagation to mangled names, GH#35211)
- empty header fields become "Unnamed: {i}" placeholder names
- an unnamed index_col now produces an unnamed index level instead
  of an index named ""
- non-dict dtype with index_col no longer raises AttributeError
- defaultdict dtype now applies its default to unlisted columns
  (GH#41574)

Un-xfails the 30 tests these bugs were responsible for.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jbrockmendel jbrockmendel added Bug IO CSV read_csv, to_csv Arrow pyarrow functionality labels Jun 11, 2026
jbrockmendel and others added 2 commits June 11, 2026 14:21
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- narrow self.names for pyright in defaultdict dtype block
- test_dtype_all_columns: xfail object-index case only under
  infer_string; xfail check_orig=False object case unconditionally

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Bug IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant