Skip to content

Learning when drop_zeros=False #1595

Open
@ColdTeapot273K

Description

@ColdTeapot273K

Versions

river version: recent main (d606d7b)
Python version: Python 3.11.8
Operating system: Fedora Linux

Describe the bug

These 4 interesting lines effectively stop OneHotEncoder from learning when drop_zeros=False:

Steps/code to reproduce

Setup:

# Sample code to reproduce the problem
>>> from pprint import pprint
>>> import random
>>> import string

>>> random.seed(42)
>>> alphabet = list(string.ascii_lowercase)
>>> X = [
...     {
...         'c1': random.choice(alphabet),
...         'c2': random.choice(alphabet),
...     }
...     for _ in range(4)
... ]

>>> from river import preprocessing
>>> oh = preprocessing.OneHotEncoder(drop_zeros=True)
>>> for x in X:
...:     oh.learn_one(x)
...:     pprint(oh.transform_one(x))
...: 
{'c1_u': 1, 'c2_d': 1}
{'c1_a': 1, 'c2_x': 1}
{'c1_i': 1, 'c2_h': 1}
{'c1_h': 1, 'c2_e': 1}

Actual result:

>>> oh.values
defaultdict(set, {})

Expected result:

>>> oh.values
defaultdict(set, {'c1': {'a', 'h', 'i', 'u'}, 'c2': {'d', 'e', 'h', 'x'}})

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions