Skip to content

[BUG] Fix encode_rna docstring terminology and add return_type validation#617

Open
Ishiezz wants to merge 2 commits into
gc-os-ai:mainfrom
Ishiezz:fix/encode-rna-docstring-and-return-type-validation
Open

[BUG] Fix encode_rna docstring terminology and add return_type validation#617
Ishiezz wants to merge 2 commits into
gc-os-ai:mainfrom
Ishiezz:fix/encode-rna-docstring-and-return-type-validation

Conversation

@Ishiezz
Copy link
Copy Markdown

@Ishiezz Ishiezz commented May 1, 2026

Reference Issues/PRs

Fixes #616

What does this implement/fix?

Two bugs fixed in pyaptamer/utils/_rna.py in the encode_rna function:

Bug 1 — Wrong docstring terminology (copy-paste from protein encoder)

The function handles RNA sequences but its docstring incorrectly referenced
"protein sequences", "amino acid patterns", and had a typo ("trunacted"):

  • "tokenizes protein sequences" → "tokenizes RNA sequences"
  • "amino acid patterns are preferred" → "RNA patterns are preferred"
  • "trunacted or zero-padded" → "truncated or zero-padded"
  • word_max_len description: "amino acid patterns" → "RNA patterns"
  • Inline comment: "single protein input" → "single RNA sequence input"

Bug 2 — Invalid return_type silently returns tensor instead of raising ValueError

Before this fix:

encode_rna("ACG", {"A": 1}, max_len=3, return_type="invalid")
# returned a tensor silently — no error

After this fix:

encode_rna("ACG", {"A": 1}, max_len=3, return_type="invalid")
# raises: ValueError: `return_type` must be either 'tensor' or 'numpy', got 'invalid'.

This is now consistent with how rna2vec handles invalid sequence_type.

Did you add any tests?

Yes. Added test_encode_rna_invalid_return_type to cover Bug 2.
All 20 tests in pyaptamer/utils/tests/test_rna.py pass.

PR checklist

  • The PR title starts with [BUG]
  • Added/modified tests
  • Used pre-commit hooks when committing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] encode_rna: incorrect docstring terminology (protein/amino acid) and missing return_type validation

1 participant