Skip to content

[BUG] Fix clean_protein_seq corrupting lowercase sequences (#348)#552

Open
Alleny244 wants to merge 1 commit into
gc-os-ai:mainfrom
Alleny244:fix/lowercase-protein-seq-348
Open

[BUG] Fix clean_protein_seq corrupting lowercase sequences (#348)#552
Alleny244 wants to merge 1 commit into
gc-os-ai:mainfrom
Alleny244:fix/lowercase-protein-seq-348

Conversation

@Alleny244

@Alleny244 Alleny244 commented Apr 27, 2026

Copy link
Copy Markdown

Reference Issues/PRs

Fixes: #550

What does this implement/fix? Explain your changes.

clean_protein_seq() in pyaptamer/utils/_pseaac_utils.py only validates against
uppercase AMINO_ACIDS, so lowercase input like "acdef" is incorrectly replaced
with "NNNNN". Added seq = seq.upper() at the start of the function to normalize
input before validation.

What should a reviewer concentrate their feedback on?

  • Whether upper() is the right normalization strategy (vs. checking both cases)
  • Whether the test cases sufficiently cover edge cases

Did you add any tests for the change?

Yes — added parametrized tests in pyaptamer/pseaac/tests/test_pseaac.py:

  • Fully lowercase sequence input
  • Mixed-case sequence input

Both verify that the cleaned output matches seq.upper().

Any other comments?

None

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Lowercase protein sequences are corrupted by clean_protein_seq

1 participant