[FEATURE]  Optional --field-ids flag to write Iceberg-compatible Parquet field IDs

**Summary**

When tpchgen-cli output is registered into an Iceberg table via PyIceberg's add_files() (a common pattern for fast TPC-H benchmark setup against object stores), the resulting tables are unreadable by Polars' native Iceberg scanner. The fix on the user side is to rewrite every Parquet file post-generation to inject PARQUET:field_id into the schema metadata, which negates much of the speed advantage of using tpchgen-cli in the first place.

**Background**

The Iceberg spec allows Parquet files without embedded field IDs as long as the table carries a schema.name-mapping.default property. PyIceberg's add_files() honours this and PyIceberg/DuckDB readers resolve columns by name when IDs are absent. Polars' native (Rust) Iceberg reader does not — it reads field IDs directly from the Parquet thrift footer and throws SchemaFieldNotFoundError: failed to load 'PARQUET:field_id' ... metadata was None. The bug is tracked in pola-rs/polars#24915 and remains open; the documented workaround is reader_override="pyiceberg", which is significantly slower and isn't intended for production use.

For users benchmarking TPC-H on Iceberg with Polars as one of the engines, this means tpchgen-cli output currently can't be used directly — every file must be rewritten through PyArrow or DuckDB's COPY ... (FIELD_IDS {...}) before upload, which doubles generation time and IO.

**Proposed feature**

A flag, e.g. --iceberg-field-ids, that writes Iceberg-compatible field IDs into the Parquet schema metadata. The TPC-H schema is fixed, so the field-ID assignment for each table is deterministic and can be hardcoded — no user configuration needed. Field IDs should be embedded both in the Parquet thrift schema (for native readers like Polars) and ideally surfaced in the Arrow schema metadata as PARQUET:field_id (for any consumer reading via Arrow).
DuckDB's FIELD_IDS COPY option is a useful reference for the on-disk format.

**Why this belongs in tpchgen-cli**

it does not :) i don't think polars will fix it 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Optional --field-ids flag to write Iceberg-compatible Parquet field IDs #262

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEATURE] Optional --field-ids flag to write Iceberg-compatible Parquet field IDs #262

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions