Skip to content

fix: declare pandas as runtime dep of parquet extra#1774

Open
dsmedia wants to merge 1 commit intofrictionlessdata:mainfrom
dsmedia:fix/parquet-extra-pandas-dep
Open

fix: declare pandas as runtime dep of parquet extra#1774
dsmedia wants to merge 1 commit intofrictionlessdata:mainfrom
dsmedia:fix/parquet-extra-pandas-dep

Conversation

@dsmedia
Copy link
Copy Markdown

@dsmedia dsmedia commented Apr 19, 2026

Fixes #1773.

Summary

frictionless[parquet] currently raises ModuleNotFoundError: No module named 'pandas' on the first .parquet read because ParquetParser uses pandas unconditionally (to_pandas(), TableResource(format="pandas"), platform.pandas.io.common.get_handle) but the parquet extra declares only pyarrow.

This is the minimal "Option A" fix from #1773: add pandas>=1.0 to the parquet extra so it matches the actual runtime surface of ParquetParser.

Changes

  • pyproject.toml: add pandas>=1.0 to the parquet extra. With this change, parquet and pandas extras are strictly redundant — which honestly reflects today's runtime coupling.
  • CHANGELOG.md: entry under [Unreleased].

Verification

Reproduced on main in a fresh venv (ModuleNotFoundError), verified fixed on this branch:

  • pip install -e '.[parquet]' resolves pyarrow + pandas as expected.
  • Resource('table.parquet').read_rows() returns rows cleanly.
  • pytest frictionless/formats/parquet/__spec__/ — 5 passed, 1 deselected (the ci-marked remote-read).

Out of scope (follow-ups for maintainers to weigh)

The ParquetParser unconditionally calls pyarrow.Table.to_pandas() and
constructs a TableResource(data=df, format="pandas"), so pandas is
required at runtime for any .parquet read. The `parquet` extra only
declared pyarrow, so `pip install 'frictionless[parquet]'` in isolation
raised ModuleNotFoundError: pandas on first read.

Tests never caught this because the hatch default env installs
frictionless[...,pandas,parquet,...] together, masking the gap. The
proper regression guard is a dedicated CI job that installs only
.[parquet] and runs the parquet tests in isolation — out of scope here.

- pyproject.toml: add pandas>=1.0 to the parquet extra
- CHANGELOG.md: note under [Unreleased]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

frictionless[parquet] raises ModuleNotFoundError: pandas on first parquet read

1 participant