fix: declare pandas as runtime dep of parquet extra by dsmedia · Pull Request #1774 · frictionlessdata/frictionless-py

dsmedia · 2026-04-19T04:52:51Z

Summary

frictionless[parquet] currently raises ModuleNotFoundError: No module named 'pandas' on the first .parquet read because ParquetParser uses pandas unconditionally (to_pandas(), TableResource(format="pandas"), platform.pandas.io.common.get_handle) but the parquet extra declares only pyarrow.

This is the minimal "Option A" fix from #1773: add pandas>=1.0 to the parquet extra so it matches the actual runtime surface of ParquetParser.

Changes

pyproject.toml: add pandas>=1.0 to the parquet extra. With this change, parquet and pandas extras are strictly redundant — which honestly reflects today's runtime coupling.
CHANGELOG.md: entry under [Unreleased].

Verification

Reproduced on main in a fresh venv (ModuleNotFoundError), verified fixed on this branch:

pip install -e '.[parquet]' resolves pyarrow + pandas as expected.
Resource('table.parquet').read_rows() returns rows cleanly.
pytest frictionless/formats/parquet/__spec__/ — 5 passed, 1 deselected (the ci-marked remote-read).

Out of scope (follow-ups for maintainers to weigh)

Option B from frictionless[parquet] raises ModuleNotFoundError: pandas on first parquet read #1773 — rewriting ParquetParser to iterate native pyarrow RecordBatches / to_pylist() without .to_pandas(). Larger change; would let parquet and pandas extras diverge again.
Preventing this class of packaging gap across all extras would require a CI job that installs each extra in isolation; that's not addressed here.
Issue Add caching mechanism and rework remote loader? #438 (file-stats population cleanup), referenced by the TODO at parser.py:45-46. Unrelated to this bug.

The ParquetParser unconditionally calls pyarrow.Table.to_pandas() and constructs a TableResource(data=df, format="pandas"), so pandas is required at runtime for any .parquet read. The `parquet` extra only declared pyarrow, so `pip install 'frictionless[parquet]'` in isolation raised ModuleNotFoundError: pandas on first read. Tests never caught this because the hatch default env installs frictionless[...,pandas,parquet,...] together, masking the gap. The proper regression guard is a dedicated CI job that installs only .[parquet] and runs the parquet tests in isolation — out of scope here. - pyproject.toml: add pandas>=1.0 to the parquet extra - CHANGELOG.md: note under [Unreleased]

dsmedia mentioned this pull request Apr 19, 2026

feat: add script to validate dataset files against datapackage.json vega/vega-datasets#782

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: declare pandas as runtime dep of parquet extra#1774

fix: declare pandas as runtime dep of parquet extra#1774
dsmedia wants to merge 1 commit intofrictionlessdata:mainfrom
dsmedia:fix/parquet-extra-pandas-dep

dsmedia commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dsmedia commented Apr 19, 2026

Summary

Changes

Verification

Out of scope (follow-ups for maintainers to weigh)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant