IterableDataset does not use features information in to_pandas

### Describe the bug

`IterableDataset` created from generator with explicit `features=` parameter seems to ignore provided features description for certain operations, e.g. `.to_pandas(...)` when data coming from the generator has missing values.

### Steps to reproduce the bug

```python
import datasets
from datasets import features


def test_to_pandas_works_with_explicit_schema():
    common_features = features.Features(
        {
            "a": features.Value("int64"),
            "b": features.List({"c": features.Value("int64")}),
        }
    )

    def row_generator():
        data = [{"a": 1, "b": []}, {"a": 1, "b": [{"c": 1}]}]
        for row in data:
            yield row

    d = datasets.IterableDataset.from_generator(row_generator, features=common_features)


    for _ in d.to_pandas():
        pass
        # _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
        # .venv/lib/python3.13/site-packages/datasets/iterable_dataset.py:3703: in to_pandas
        #     table = pa.concat_tables(list(self.with_format("arrow").iter(batch_size=1000)))
        #                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        # .venv/lib/python3.13/site-packages/datasets/iterable_dataset.py:2563: in iter
        #     for key, pa_table in iterator:
        #                          ^^^^^^^^
        # .venv/lib/python3.13/site-packages/datasets/iterable_dataset.py:2078: in _iter_arrow
        #     for key, pa_table in self.ex_iterable._iter_arrow():
        #                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        # .venv/lib/python3.13/site-packages/datasets/iterable_dataset.py:599: in _iter_arrow
        #     yield new_key, pa.Table.from_batches(chunks_buffer)
        #                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        # pyarrow/table.pxi:5039: in pyarrow.lib.Table.from_batches
        #     ???
        # pyarrow/error.pxi:155: in pyarrow.lib.pyarrow_internal_check_status
        #     ???
        # _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

        # >   ???
        # E   pyarrow.lib.ArrowInvalid: Schema at index 1 was different: 
        # E   a: int64
        # E   b: list<item: null>
        # E   vs
        # E   a: int64
        # E   b: list<item: struct<c: int64>>

        # pyarrow/error.pxi:92: ArrowInvalid
```

### Expected behavior

arrow operations use schema provided through `features=` and not the one inferred from the data

### Environment info

- datasets version: 4.4.1
- Platform: macOS-15.7.1-arm64-arm-64bit-Mach-O
- Python version: 3.13.1
- huggingface_hub version: 1.1.4
- PyArrow version: 22.0.0
- Pandas version: 2.3.3
- fsspec version: 2025.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IterableDataset does not use features information in to_pandas #7872

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IterableDataset does not use features information in to_pandas #7872

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions