Skip to content

[ENH] dual-encoder architecture for protein-conditioned aptamer generation#539

Open
NoorMajdoub wants to merge 1 commit into
gc-os-ai:mainfrom
NoorMajdoub:feat/aptamlm-dual-encoder
Open

[ENH] dual-encoder architecture for protein-conditioned aptamer generation#539
NoorMajdoub wants to merge 1 commit into
gc-os-ai:mainfrom
NoorMajdoub:feat/aptamlm-dual-encoder

Conversation

@NoorMajdoub

Copy link
Copy Markdown

Reference Issues/PRs

Addresses #131

What does this implement/fix? Explain your changes.

Implements initial version of AptaMLM: a dual-encoder architecture for protein-conditioned aptamer generation, adapting the PepMLM approach to nucleic acids as discussed in #131.

  • Protein encoder: frozen ESM2 (same backbone as PepMLM)
  • Nucleotide encoder: frozen NucleotideTransformer
  • Cross-attention between both encoders as a bridge for protein conditionned encoding of the nucleotide.

What should a reviewer concentrate their feedback on?

  • Cross-attention bridge between the two encoders .
  • Choice of the frozen encoders.
  • Masking approach

Did you add any tests for the change?

Forward pass and MLM loss verified to run. Formal unit tests not yet .(Check the notebook in the mentionned repo)

Any other comments?

This is a an initial architecture implementation , the training loop is not yet finalised as I didn't confirm the decoding strategy .
Tried to combine the structure from the PepMLM paper discussed in the the issue #131 with the paper BAnG that I believe can help this approach (https://arxiv.org/abs/2502.21274) (BAnG: Bidirectional Anchored Generation for Conditional RNA Design)
Before reviewing this code ,I would recommend checking the notebook and read me in this repo as they provide a clearer implementation walkthrough with execution results: https://github.com/NoorMajdoub/pyaptamer_dual_encoder

PS: Also tried to implement contrastive loss to address the issue of scarcity of positive samples mentioned in the issue , and to make use of the negative samples in the dataset.

PR checklist

  • [ X] The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant