Skip to content

Learning when drop_zeros=False #1595

Open
@ColdTeapot273K

Description

Versions

river version: recent main (d606d7b)
Python version: Python 3.11.8
Operating system: Fedora Linux

Describe the bug

These 4 interesting lines effectively stop OneHotEncoder from learning when drop_zeros=False:

Steps/code to reproduce

Setup:

# Sample code to reproduce the problem
>>> from pprint import pprint
>>> import random
>>> import string

>>> random.seed(42)
>>> alphabet = list(string.ascii_lowercase)
>>> X = [
...     {
...         'c1': random.choice(alphabet),
...         'c2': random.choice(alphabet),
...     }
...     for _ in range(4)
... ]

>>> from river import preprocessing
>>> oh = preprocessing.OneHotEncoder(drop_zeros=True)
>>> for x in X:
...:     oh.learn_one(x)
...:     pprint(oh.transform_one(x))
...: 
{'c1_u': 1, 'c2_d': 1}
{'c1_a': 1, 'c2_x': 1}
{'c1_i': 1, 'c2_h': 1}
{'c1_h': 1, 'c2_e': 1}

Actual result:

>>> oh.values
defaultdict(set, {})

Expected result:

>>> oh.values
defaultdict(set, {'c1': {'a', 'h', 'i', 'u'}, 'c2': {'d', 'e', 'h', 'x'}})

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions