Skip to content

Commit 8ec4a4e

Browse files
authored
[BUG] GreedyEncoder silently truncates sequences exceeding max_len without warning (#670)
#### Reference Issues/PRs Fixes #669 #### What does this implement/fix? Explain your changes. `GreedyEncoder` in `pyaptamer/trafos/encode/_greedy.py` was silently truncating sequences that exceeded `max_len` with no indication to the user. This is inconsistent with how `clean_protein_seq` in `_pseaac_utils.py` handles sequence modification — it issues a `UserWarning` whenever it changes a sequence. From a chemistry standpoint this is also risky since aptamer sequences fold into specific 3D structures like stem-loops and G-quadruplexes that are length dependent, so a silent truncation could cause a sequence to encode a structurally different conformation without the researcher knowing. The fix adds a `UserWarning` when truncation occurs, following the exact same pattern already established by `clean_protein_seq`. #### What should a reviewer concentrate their feedback on? - The placement of `warnings.warn()` inside `_transform()` and whether `stacklevel=2` is correct for this call depth - Whether the warning message is clear enough for end users #### Did you add any tests for the change? No new tests were added. The fix is a single warning call that follows the existing pattern in the codebase. #### Any other comments? This was discussed with @satvshr on Discord who confirmed the fix is welcome. #### PR checklist - [x] The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings. - [ ] Added/modified tests - [x] Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with `pre-commit install`. To run hooks independent of commit, execute `pre-commit run --all-files`
1 parent dd4accd commit 8ec4a4e

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

pyaptamer/trafos/encode/_greedy.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
"""Base transformation class."""
22

3+
import warnings
4+
35
import numpy as np
46
import pandas as pd
57

@@ -116,6 +118,11 @@ def _transform(self, X):
116118
# stop if we've reached max_len tokens
117119
if max_len is not None and len(tokens) >= max_len:
118120
tokens = tokens[:max_len]
121+
warnings.warn(
122+
"One or more sequence exceeds maximum length and was truncted ",
123+
UserWarning,
124+
stacklevel=2,
125+
)
119126
break
120127

121128
encoded_seqs.append(tokens)

0 commit comments

Comments
 (0)