Skip to content

[ENH] Add validate_sequence utility for DNA/RNA/protein input validation#625

Open
Vaishnav88sk wants to merge 1 commit into
gc-os-ai:mainfrom
Vaishnav88sk:feature/sequence-validation
Open

[ENH] Add validate_sequence utility for DNA/RNA/protein input validation#625
Vaishnav88sk wants to merge 1 commit into
gc-os-ai:mainfrom
Vaishnav88sk:feature/sequence-validation

Conversation

@Vaishnav88sk
Copy link
Copy Markdown

Reference Issues/PRs

Fixes #619

What does this implement/fix? Explain your changes.

Introduces a centralized validation module pyaptamer/utils/_validate.py to ensure input sequence integrity across the library.

  • validate_sequence(sequence, molecule_type): Raises ValueError with positions of invalid characters.
  • is_valid_sequence(sequence, molecule_type): Boolean check.
    Supports DNA, RNA, and the 20 standard amino acids.

What should a reviewer concentrate their feedback on?

  • Alphabet completeness for standard proteins.
  • Usefulness of the error message format.

Did you add any tests for the change?

Yes, added pyaptamer/utils/tests/test_validate.py with 17 tests covering all molecule types and invalid inputs.

Any other comments?

N/A

PR checklist

  • The PR title starts with [ENH]
  • Added/modified tests
  • Used pre-commit hooks

Add centralized validation module with two functions:
- validate_sequence(): raises ValueError with clear message showing
  invalid characters and their exact positions
- is_valid_sequence(): boolean alternative (no exceptions)

Supports DNA (ACGT), RNA (ACGU), and protein (20 standard amino acids).
Case-insensitive. Includes 17 tests covering all molecule types, edge
cases, error messages, and type checking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Add validate_sequence utility for DNA/RNA/protein input validation

1 participant