`TFIDF.transform_many()` fails on `DataFrame` input

## Versions


**river version**: 0.21.2
**Python version**: 3.11.7
**Operating system**: macOS 14.4

## Describe the bug


The [`TFIDF` feature extractor](https://riverml.xyz/latest/api/feature-extraction/TFIDF/) claims to support both online and mini-batch transformations, but the latter case only works when the transformer doesn't specify the `on` parameter. In other words, batch mode works for `pd.Series` input, but not `pd.Dataframe`.

## Steps/code to reproduce


```python
import pandas as pd
import river.feature_extraction

model = river.feature_extraction.TFIDF()
X = pd.Series(["foo bar bat baz", "foo bar spam eggs"])
for rec in X:
    print(model.transform_one(rec))
# WORKS
# {'foo': 0.5, 'bar': 0.5, 'bat': 0.5, 'baz': 0.5}
# {'foo': 0.5, 'bar': 0.5, 'spam': 0.5, 'eggs': 0.5}
print(model.clone().transform_many(X))
# WORKS
#    foo  bar  bat  baz  spam  eggs
# 0    1    1    1    1     0     0
# 1    1    1    0    0     1     1

model = river.feature_extraction.TFIDF(on="text")
X = pd.DataFrame([{"text": "foo bar bat baz"}, {"text": "foo bar spam eggs"}])
for rec in X.to_dict(orient="records"):
    print(model.transform_one(rec))
# WORKS
# {'foo': 0.5, 'bar': 0.5, 'bat': 0.5, 'baz': 0.5}
# {'foo': 0.5, 'bar': 0.5, 'spam': 0.5, 'eggs': 0.5}
print(model.clone().transform_many(X))
# DOES NOT WORK
```

That last call produces the following traceback:

```python
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[95], line 1
----> 1 print(model.clone().transform_many(X))

File [~/.pyenv/versions/3.11.7/envs/ds/lib/python3.11/site-packages/river/feature_extraction/vectorize.py:349](http://localhost:8888/lab/tree/Desktop/notebooks/~/.pyenv/versions/3.11.7/envs/ds/lib/python3.11/site-packages/river/feature_extraction/vectorize.py#line=348), in BagOfWords.transform_many(self, X)
    347 for d in X:
    348     t: int
--> 349     for t, f in collections.Counter(self.process_text(d)).items():
    350         indices.append(index.setdefault(t, len(index)))
    351         data.append(f)

File [~/.pyenv/versions/3.11.7/envs/ds/lib/python3.11/site-packages/river/feature_extraction/vectorize.py:220](http://localhost:8888/lab/tree/Desktop/notebooks/~/.pyenv/versions/3.11.7/envs/ds/lib/python3.11/site-packages/river/feature_extraction/vectorize.py#line=219), in VectorizerMixin.process_text(self, x)
    218 def process_text(self, x):
    219     for step in self.processing_steps:
--> 220         x = step(x)
    221     return x

TypeError: string indices must be integers, not 'str'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TFIDF.transform_many()` fails on `DataFrame` input #1576

Versions

Describe the bug

Steps/code to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TFIDF.transform_many() fails on DataFrame input #1576

Description

Versions

Describe the bug

Steps/code to reproduce

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`TFIDF.transform_many()` fails on `DataFrame` input #1576