Skip to content

[BUG] GreedyEncoder crashes or silently produces all-zeros output when words dict is empty #631

@Ishiezz

Description

@Ishiezz

Describe the bug

GreedyEncoder in pyaptamer/trafos/encode/_greedy.py has two failure modes
when initialized with an empty words dict, depending on whether word_max_len
is provided:

Mode 1 — Confusing crash when word_max_len=None (the default):

import pandas as pd
from pyaptamer.trafos.encode import GreedyEncoder

enc = GreedyEncoder(words={})
X = pd.DataFrame({"seq": [["A", "C", "G"]]})
enc.fit_transform(X)
# ValueError: max() iterable argument is empty

The user has no idea the root cause is words={}. The error
points to an internal max() call, not the bad input.

Mode 2 — Silent all-zeros output when word_max_len is provided:

enc = GreedyEncoder(words={}, word_max_len=3)
X = pd.DataFrame({"seq": [["A", "C", "G"]]})
result = enc.fit_transform(X)
print(result)
#    0  1  2
# 0  0  0  0
# No error, no warning — silently returns garbage output

Expected behavior

An empty words dict should raise a clear ValueError at initialization,
consistent with how the rest of the codebase handles invalid inputs.

GreedyEncoder(words={})
# ValueError: `words` must not be empty.

Additional context

  • Affected file: pyaptamer/trafos/encode/_greedy.py
  • Fix: add an empty dict guard in __init__ with a Raises section in the docstring
  • A test should be added to the existing encoder test suite

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions