Describe the bug
clean_protein_seq()replaces valid lowercase amino acid characters with'N'becauseAMINO_ACIDSonly contains uppercase letters. Lowercase input like"acdef"is treated as entirely invalid and returned as"NNNNN"`.
To Reproduce
from pyaptamer.utils._pseaac_utils import clean_protein_seq
print(clean_protein_seq("acdef"))
# Output: "NNNNN" (expected: "ACDEF")
print(clean_protein_seq("AcDeF"))
# Output: "ANDNF" (expected: "ACDEF")
Expected behavior
Lowercase and mixed-case sequences should be normalized to uppercase before validation,
preserving all valid amino acids. clean_protein_seq("acdef") should return "ACDEF",
not "NNNNN".
Additional context
PDB files and other bioinformatics tools sometimes output lowercase amino acid sequences.
The function should handle these gracefully rather than treating them as invalid residues.
Versions
0.1.0a1
Describe the bug
clean_protein_seq()
replaces valid lowercase amino acid characters with'N'becauseAMINO_ACIDSonly contains uppercase letters. Lowercase input like"acdef"is treated as entirely invalid and returned as"NNNNN"`.To Reproduce
Expected behavior
Lowercase and mixed-case sequences should be normalized to uppercase before validation,
preserving all valid amino acids. clean_protein_seq("acdef") should return "ACDEF",
not "NNNNN".
Additional context
PDB files and other bioinformatics tools sometimes output lowercase amino acid sequences.
The function should handle these gracefully rather than treating them as invalid residues.
Versions
0.1.0a1