Describe the bug
GreedyEncoder in pyaptamer/trafos/encode/_greedy.py has two failure modes
when initialized with an empty words dict, depending on whether word_max_len
is provided:
Mode 1 — Confusing crash when word_max_len=None (the default):
import pandas as pd
from pyaptamer.trafos.encode import GreedyEncoder
enc = GreedyEncoder(words={})
X = pd.DataFrame({"seq": [["A", "C", "G"]]})
enc.fit_transform(X)
# ValueError: max() iterable argument is empty
The user has no idea the root cause is words={}. The error
points to an internal max() call, not the bad input.
Mode 2 — Silent all-zeros output when word_max_len is provided:
enc = GreedyEncoder(words={}, word_max_len=3)
X = pd.DataFrame({"seq": [["A", "C", "G"]]})
result = enc.fit_transform(X)
print(result)
# 0 1 2
# 0 0 0 0
# No error, no warning — silently returns garbage output
Expected behavior
An empty words dict should raise a clear ValueError at initialization,
consistent with how the rest of the codebase handles invalid inputs.
GreedyEncoder(words={})
# ValueError: `words` must not be empty.
Additional context
- Affected file:
pyaptamer/trafos/encode/_greedy.py
- Fix: add an empty dict guard in
__init__ with a Raises section in the docstring
- A test should be added to the existing encoder test suite
Describe the bug
GreedyEncoderinpyaptamer/trafos/encode/_greedy.pyhas two failure modeswhen initialized with an empty
wordsdict, depending on whetherword_max_lenis provided:
Mode 1 — Confusing crash when
word_max_len=None(the default):The user has no idea the root cause is
words={}. The errorpoints to an internal
max()call, not the bad input.Mode 2 — Silent all-zeros output when
word_max_lenis provided:Expected behavior
An empty
wordsdict should raise a clearValueErrorat initialization,consistent with how the rest of the codebase handles invalid inputs.
Additional context
pyaptamer/trafos/encode/_greedy.py__init__with aRaisessection in the docstring