Open
Description
Versions
river version: recent main (d606d7b)
Python version: Python 3.11.8
Operating system: Fedora Linux
Describe the bug
These 4 interesting lines effectively stop OneHotEncoder
from learning when drop_zeros=False
:
- https://github.com/online-ml/river/blob/main/river/preprocessing/one_hot.py#L203-L204
- https://github.com/online-ml/river/blob/main/river/preprocessing/one_hot.py#L234-L235
Steps/code to reproduce
Setup:
# Sample code to reproduce the problem
>>> from pprint import pprint
>>> import random
>>> import string
>>> random.seed(42)
>>> alphabet = list(string.ascii_lowercase)
>>> X = [
... {
... 'c1': random.choice(alphabet),
... 'c2': random.choice(alphabet),
... }
... for _ in range(4)
... ]
>>> from river import preprocessing
>>> oh = preprocessing.OneHotEncoder(drop_zeros=True)
>>> for x in X:
...: oh.learn_one(x)
...: pprint(oh.transform_one(x))
...:
{'c1_u': 1, 'c2_d': 1}
{'c1_a': 1, 'c2_x': 1}
{'c1_i': 1, 'c2_h': 1}
{'c1_h': 1, 'c2_e': 1}
Actual result:
>>> oh.values
defaultdict(set, {})
Expected result:
>>> oh.values
defaultdict(set, {'c1': {'a', 'h', 'i', 'u'}, 'c2': {'d', 'e', 'h', 'x'}})
Activity