[BUG] Honor MoleculeLoader ignore_duplicates when exporting sequences by officialasishkumar · Pull Request #313 · gc-os-ai/pyaptamer

officialasishkumar · 2026-04-06T18:36:23Z

Reference Issues/PRs

Fixes #312.

What does this implement/fix? Explain your changes.

This fixes MoleculeLoader.to_df_seq() so the documented ignore_duplicates flag is actually honored. The loader now builds the sequence DataFrame first, drops duplicate sequence rows while keeping the first occurrence when the flag is enabled, and then applies any optional column renaming.

What should a reviewer concentrate their feedback on?

The duplicate-removal behavior in MoleculeLoader.to_df_seq() and whether preserving the first indexed occurrence matches the intended contract for sequence exports.

Did you add any tests for the change?

Yes. I added a regression test that reproduces the bug with duplicate PDB inputs and verifies that:

duplicate sequence rows are removed when ignore_duplicates=True
optional custom column names still work after deduplication

Any other comments?

Validation run locally:

.venv/bin/pytest pyaptamer -p no:warnings
.venv/bin/pre-commit run --files pyaptamer/data/loader.py pyaptamer/data/tests/test_loader.py

PR checklist

The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

[BUG] Honor MoleculeLoader ignore_duplicates when exporting sequences

80de4e0

siddharth7113 self-requested a review April 7, 2026 14:54

siddharth7113 marked this pull request as draft April 7, 2026 15:12

siddharth7113 closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Honor MoleculeLoader ignore_duplicates when exporting sequences#313

[BUG] Honor MoleculeLoader ignore_duplicates when exporting sequences#313
officialasishkumar wants to merge 1 commit into
gc-os-ai:mainfrom
officialasishkumar:issue312

officialasishkumar commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

officialasishkumar commented Apr 6, 2026

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants