Dataset.map crashes when first examples return None and later examples return dict — writer not initialized

### Describe the bug

I detected a serious [bug from datasets/arrow_dataset.py](https://github.com/huggingface/datasets/blob/main/src/datasets/arrow_dataset.py#L3676)
---

**Description of the bug**
`Dataset.map` crashes with `writer is None` when the map function returns `None` for the first few examples and a dictionary (or `pa.Table` / DataFrame) for later examples. This happens because the internal writer is initialized only when `i == 0` (or `i[0] == 0` in batched mode), but `update_data` is determined lazily after processing the first example/batch.

**Steps to reproduce**

```python
from datasets import Dataset

ds = Dataset.from_dict({"x": [1, 2, 3]})

def fn(example, idx):
    if idx < 2:
        return None
    return {"x": [example["x"] * 10]}

list(ds.map(fn, with_indices=True))
```

**Expected behavior**

* The function should work regardless of when `update_data` becomes `True`.
* Writer should be initialized the first time a non-`None` return occurs, not tied to the first index.

**Environment info**

* `datasets` version: <insert your version>
* Python version: 3.12
* OS: <insert your OS>

**Suggested fix**
Replace `if i == 0` / `if i[0] == 0` checks with `if writer is None` when initializing the writer.

---

### Steps to reproduce the bug

Here's a ready-to-use version you can paste into that section:

---

### Steps to reproduce the bug

```python
from datasets import Dataset

# Create a minimal dataset
ds = Dataset.from_dict({"x": [1, 2, 3]})

# Define a map function that returns None for first examples, dict later
def fn(example, idx):
    if idx < 2:
        return None
    return {"x": [example["x"] * 10]}

# Apply map with indices
list(ds.map(fn, with_indices=True))
```

**Expected:** function executes without errors.
**Observed:** crashes with `AttributeError: 'NoneType' object has no attribute 'write'` because the internal writer is not initialized when the first non-None return happens after i > 0.

---

This is minimal and clearly demonstrates the exact failure condition (`None` early, `dict` later).

### Expected behavior

---

**Expected behavior**
The `Dataset.map` function should handle map functions that return `None` for some examples and a dictionary (or `pa.Table` / DataFrame) for later examples. In this case, the internal writer should be initialized when the first non-`None` value is returned, so that the dataset can be updated without crashing. The code should run successfully for all examples and return the updated dataset.

---

### Environment info

- python3.12
- datasets==3.6.0 [but the latest version still has this problem]
- transformers==4.55.2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset.map crashes when first examples return None and later examples return dict — writer not initialized #7990

Describe the bug

I detected a serious bug from datasets/arrow_dataset.py

Steps to reproduce the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dataset.map crashes when first examples return None and later examples return dict — writer not initialized #7990

Description

Describe the bug

I detected a serious bug from datasets/arrow_dataset.py

Steps to reproduce the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions