[ENH] AptaNet algorithm by satvshr · Pull Request #30 · gc-os-ai/pyaptamer

satvshr · 2025-07-07T07:57:29Z

Merge after #28.
Solves #13.
Adds AptaNet, a binary classification algorithm to predict if an aptamer will bind to the protein or not.

satvshr · 2025-07-07T07:58:14Z

To do: Add tests comparing it to the implementation in the official AptaNet repository.

fkiraly

Some questions for my understanding: what is this trained on?

Are there pre-trained weights? If so, where?
If we can also train on own data, how does that work?

…on tests and bug fixing

satvshr · 2025-07-07T21:06:23Z

Are there pre-trained weights? If so, where?

Nope, there are no pre-trained weights.

What is this trained on?

This is answered in the "Data collection" section of the original paper (page 11)

If we can also train on our own data, how does that work?

I assume you send aptamer and target sequence as X, and y will be a binary value (if it binds or not)

fkiraly · 2025-07-09T19:40:30Z

What is this trained on?

This is answered in the "Data collection" section of the original paper (page 11)

Can you give a short summary in your own words, or say that you do not know?

If there are no pretrained weights, then there is nothing this was trained on.

fkiraly

Great!

Request to change the signature of the class:

move the __init__ args to the method generate_final-vector
rename the latter to transform
change the output to an 1D np.ndarray

Also, usual quality requests:

please ensure to add docstring examples
please ensure to add module docstrings

satvshr · 2025-07-10T07:24:36Z

Can you give a short summary in your own words, or say that you do not know?

Had a look again, and in the original repository they do have a Dataset.csv file.

If there are no pretrained weights, then there is nothing this was trained on.

I don’t necessarily agree with that point, just because the weights weren’t saved or shared doesn’t imply that the model was never trained. The original AptaNet paper includes a results section, which strongly suggests that the model was trained, even if those weights aren’t available.

NennoMP · 2025-07-10T15:58:26Z

I would like to provide some feedback about structure.
@fkiraly may have a different opinion, so take this with a grain of salt.

I quickly skimmed through the original paper and found that AptaNet is essentially a multi-layer perceptron (MLP) with some pre-processing that includes PseAAC, random forest, etc. Personally, I think it would be better to have the pre-processing logic (currently in aptanet.py) in separate classes and/or utility methods. I would also call AptaNet the class containing the architecture (currently defined in class MLP).

Also, I noticed the authors mention applying a neighborhood cleaning algorithm to address class imbalance. In their code, this corresponds to the following snippet between the random forest and the actual neural network:

imblearn.under_sampling import NeighbourhoodCleaningRule
# ...
# apply random forest
ncr = NeighbourhoodCleaningRule()
x_resampled, y_resampled = ncr.fit_resample(x, y)
# apply MLP
# ...

I may have missed it, but it doesn't appear to be included in our current implementation. Indeed, this is a step needed only when the dataset is skewed in favour of one class, so we could simply have an optional argument that applies or does not apply such step.

NennoMP · 2025-07-10T15:58:39Z

As promised, some comments on the deep neural network.

For multiple layers, I prefer using nn.Sequential as a container rather than hardcode them. This approach offers better decoupling of logic, readability, and reusability (https://github.com/FrancescoSaverioZuppichini/Pytorch-how-and-when-to-use-Module-Sequential-ModuleList-and-ModuleDict).

I would also suggest to make AptaNet architecture customizable with arguments for number of layers, dropout, etc.. Finally, I would argue that random forest and training should be outside the class where the architecture itself is defined, for instance directly in some example/tutorial notebook. Motivations: seperation of concerns and, in the context of training, align with PyTorch/torchvision style.

In particular, I think having feature extraction (random forest + SelectFromModel) here could be problematic. SelectFromModel transforms some features from shape n to m where m << n. However, m is unknown at priori. This means that we would have to "delay" initialization of fully-connected layers until m is know by inspecting the output from SelectFromModel. Currently this is hardcoded as input_dim=639 but this won't always work.

That said, below is an example of how the AptaNet deep neural network could be refactored.

import torch.nn as nn from torch import Tensor # Each AptaNet hidden layer has the same three components: (nn.Linear - Activation - AlphaDropout) # Thus, we can simplify our code by having a function that returns a nn.Sequential container of # them. This also helps in reducing code duplication btw! def aptanet_layer(input_dim: int, output_dim: int, dropout: float) -> nn.Sequential: """Create a single AptaNet layer with AlphaDropout and ReLU activation.""" return nn.Sequential( nn.Linear(input_dim, output_dim), nn.ReLU(), nn.AlphaDropout(dropout), ) class AptaNet(nn.Module): """AptaNet deep neural network for classification.""" def __init__( self, n_layers: int, input_dim: int, hidden_dim: int, output_dim: int, dropout: float, ) -> None: super().__init__() assert n_layers > 0, "Number of hidden layers must be greater than 0." self.model = self._init_model(n_layers, input_dim, hidden_dim, output_dim, dropout) def _init_model( self, n_layers: int, input_dim: int, hidden_dim: int, output_dim: int, dropout: float, ) -> nn.Sequential: """Initialize AptaNet's deep neural network.""" model = [aptanet_layer(input_dim, hidden_dim, dropout)] for _ in range(n_layers): model.append(aptanet_layer(hidden_dim, hidden_dim, dropout)) model.append(nn.Linear(hidden_dim, output_dim)) model.append(nn.Sigmoid()) return nn.Sequential(*model) def forward(self, x: Tensor) -> Tensor: # thanks to nn.Sequential() we can now use the model directly, rather than applying each # architectural component manually return self.model(x)

agree with the principles, great ideas!

defaults should probably be what is currently hard-coded

satvshr · 2025-07-11T12:36:35Z

I think it would be better to have the pre-processing logic (currently in aptanet.py) in separate classes and/or utility methods.

Hmm the only preprocessing happening is in _generate_kmer_vecs which on second thought should be moved to utils.py in root given its a kmer generating function which may be used by other algorithms. generate_final_vector can be renamed as preprocessing given that is what the function is doing (combining kmer frequency vector and the vector being generated by the pseaac encoding algorithm), before sending the vector through the neural net and then I can delete neural_net.py and move everything to one file called aptanet.py, sounds good @NennoMP @fkiraly ?

Indeed, this is a step needed only when the dataset is skewed in favour of one class, so we could simply have an optional argument that applies or does not apply such a step.

Did not add it as it seemed optional, but I could certainly add a method giving that provision to users. Giving my 2 cents to it, I don't believe data preparation should be combined with the main algorithm, given it's not something we need to do before sending data through AptaNet, and is a part of data preprocessing in general.

I would also suggest to make AptaNet architecture customizable with arguments for the number of layers, dropout, etc..

I heavily disagree with this, given that we will be changing the architecture completely and the implementation will no longer be of AptaNet, but something completely different, if that makes sense.

Other changes (especially the code block) are pretty interesting and eye-opening! I will definitely try integrating them into the PR. Thanks @NennoMP !
@fkiraly I would appreciate your take, given we do not agree on the above topics.

fkiraly · 2025-08-10T17:58:39Z

I see - pickling might be failing due to torch objects - I am not sure why it does not fail on the remote?

satvshr · 2025-08-10T18:17:49Z

I see - pickling might be failing due to torch objects - I am not sure why it does not fail on the remote?

So.....what to do about it?

fkiraly · 2025-08-10T18:47:37Z

Can you check why it is failing locally but not remove? E.g., discrepancies in versions

satvshr · 2025-08-10T20:15:33Z

discrepancies in versions

Great guess! For future reference: In CI tests under "Install packages and dependencies" one can see all the packages installed in the testing env.
I had to update my skorch version, it solved the bugs. Now only a warning is thrown locally which I can supress but it does not seem to be a big deal given it is not an error:

pyaptamer/aptanet/tests/test_aptanet.py::test_sklearn_compatible_estimator[AptaNetFeaturesClassifier()-check_n_features_in_after_fitting]
 UserWarning: The least populated class in y has only 4 members, which is less than n_splits=5.
    warnings.warn(

fkiraly

Great! Looks like it works now.

signature: please make all the parameters in AptaNetPipeline explicit. pairs_to_features is not a public utility.
I would make AptaNetClassifier public, and expose the classifier choice as an arg classifier in AptaNetPipeline. The default is the default of AptaNetClassifier (or, None; and make sure you clone and do not overwrite __init__ params)
docstring: please add a reference to the algorithm in the title, e.g., what algorithm is it? Reference the source prominently
docstring: please avoid double newlines
docstring: docstrings should make clear which component a parameter applies to.

Question: what implies python<3.13?

satvshr · 2025-08-11T19:24:01Z

expose the classifier choice as an arg classifier in AptaNetPipeline

Only the classifier? The AptaNetFeaturesClassifier (rename to AptaNetClassifier) contains the random forest classifier along with the AptaNetMLP, so do you want the random forest classifier as a "classifier choice"?
Edit: Only making the classifier as a choice, not the network.

docstring: please avoid double newlines

I thought before and after every list we should add 2 newlines? Was that not what we discussed in the PSeAAC PR?

Question: what implies python<3.13?

skorch requires versions <3.13.

satvshr · 2025-08-12T04:57:28Z

signature: please make all the parameters in AptaNetPipeline explicit. pairs_to_features is not a public utility.

Should I do that even for AptaNetClassifier?

fkiraly · 2025-08-12T06:43:16Z

so do you want the random forest classifier as a "classifier choice"? Edit: Only making the classifier as a choice, not the network.

Yes, but it should be a choice up to the user. Any sklearn compatible classifier should work.

Should I do that even for AptaNetClassifier?

You expose it as classifier, the parameters of which will be explicit because it in return does not accept kwargs but named parameters, so that will not be necessary as long as AptaNetClassifier does the same.

I thought before and after every list we should add 2 newlines? #29 (comment)?

Yes, you are right - I mean there are instances of three newlines throughout your docstrings. The max should be two, and I am surprised that the linting does not catch this.

fkiraly · 2025-08-12T20:23:41Z

+        self.pipeline_.fit(X, y)
+
+    def predict(self, X):
+        if not hasattr(self, "pipeline_"):


I would use the scikit-learn idiomatic check_is_fitted here

fkiraly · 2025-08-12T20:24:20Z

@@ -0,0 +1,93 @@
+from itertools import product


if these are specifically to aptanet, move then to aptanet

fkiraly · 2025-08-12T20:24:55Z

+    return kmer_freq
+
+
+def pairs_to_features(X, k=4):


this looks like it should either be in aptanet or pseaac folder

I thought we were keeping utility functions inside the utils directory? Why move it, and more importantly, to where (given its a utility function) and how to put it inside pseaac or aptanet (file name, sub folder name)?

fkiraly

I left minimal comments

satvshr · 2025-08-13T08:14:01Z

As discussed in the daily, we will be keeping the utils only for aptanet in a private file inside the utils folder at the moment. I added check_is_fitted as requested, I had removed it as it was not needed to pass sklearn checks and I was afraid they would fail, but there are no issues and all tests pass.

fkiraly

just some docstring remarks

Requires #61, #67, and #30. closes #54

satvshr added 2 commits July 7, 2025 13:21

Added the pseaac encoding algorithm

e37135c

Added Aptanet implementation

6ea5ff7

satvshr marked this pull request as draft July 7, 2025 07:57

fkiraly assigned satvshr Jul 7, 2025

fkiraly reviewed Jul 7, 2025

View reviewed changes

Made pseaac to a class and made the functions private, still working …

a5f01e0

…on tests and bug fixing

satvshr added 9 commits July 8, 2025 03:02

Made a few readability changes

3773a90

Edited tests

9b9a3da

Added pytest to tests

2dfe0c7

Added numpy style docstrings and ruff formatting

1e182d3

Added docstrings, made functions pvt and made code more clean

20d7e37

Removed AptaNet from root

fc2f051

Added example

62f6c42

Removed AptaNet from root

848fc9b

Made requested changes

1515efe

fkiraly requested changes Jul 9, 2025

View reviewed changes

satvshr added 2 commits July 10, 2025 12:59

Merge branch 'main' into issue28

75d4efb

Made requested changes and updated tests

733f908

NennoMP reviewed Jul 10, 2025

View reviewed changes

satvshr added 5 commits July 11, 2025 08:59

Made suggested changes

04ab599

Removed lint. from pyproject, will push it as a separate PR

dc78e44

Refactored code

c347988

Added pandas as a dependancy

d9537f4

Renamed parent folder name to put it in the same level as AptaNet

1c46c55

Spacing for lists

1988408

Update _feature_classifier.py

cc64583

fkiraly requested changes Aug 11, 2025

View reviewed changes

Merge branch 'main' into issue13

06a44a4

Made requested and architectural changes

183c48b

fkiraly reviewed Aug 12, 2025

View reviewed changes

fkiraly requested changes Aug 12, 2025

View reviewed changes

Update _pipeline.py

524f179

satvshr requested a review from fkiraly August 13, 2025 09:27

fkiraly reviewed Aug 14, 2025

View reviewed changes

Comment thread pyaptamer/aptanet/_pipeline.py Outdated

fkiraly reviewed Aug 14, 2025

View reviewed changes

Comment thread pyaptamer/aptanet/_pipeline.py Outdated

fkiraly reviewed Aug 14, 2025

View reviewed changes

Comment thread pyaptamer/aptanet/_pipeline.py Outdated

fkiraly reviewed Aug 14, 2025

View reviewed changes

Comment thread pyaptamer/aptanet/_pipeline.py Outdated

fkiraly requested changes Aug 14, 2025

View reviewed changes

Update _pipeline.py

11e2e3d

satvshr requested a review from fkiraly August 14, 2025 13:33

fkiraly approved these changes Aug 14, 2025

View reviewed changes

fkiraly merged commit f6f0e49 into main Aug 14, 2025
13 checks passed

satvshr deleted the issue13 branch August 14, 2025 20:29

fkiraly pushed a commit that referenced this pull request Sep 2, 2025

[DOC] example notebook for AptaNet (#68)

b448291

Requires #61, #67, and #30. closes #54

Conversation

satvshr commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

satvshr commented Jul 7, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

satvshr commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jul 9, 2025

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satvshr commented Jul 10, 2025

Uh oh!

NennoMP commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NennoMP Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

satvshr commented Jul 11, 2025

Uh oh!

fkiraly commented Aug 10, 2025

Uh oh!

satvshr commented Aug 10, 2025

Uh oh!

fkiraly commented Aug 10, 2025

Uh oh!

satvshr commented Aug 10, 2025

Uh oh!

fkiraly left a comment • edited by satvshr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satvshr commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

satvshr commented Aug 12, 2025

Uh oh!

fkiraly commented Aug 12, 2025

Uh oh!

fkiraly Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

satvshr Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

satvshr commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

satvshr commented Jul 7, 2025 •

edited

Loading

satvshr commented Jul 7, 2025 •

edited

Loading

fkiraly left a comment •

edited

Loading

NennoMP commented Jul 10, 2025 •

edited

Loading

NennoMP Jul 10, 2025 •

edited

Loading

fkiraly left a comment •

edited by satvshr

Loading

satvshr commented Aug 11, 2025 •

edited

Loading