Skip to content

[ENH] Changes to AptaNet for better customization #169

@NennoMP

Description

@NennoMP

While investigating #144 I realized that AptaNet has a few issues when you want to customize it's hyperparameters: some hyperparameters are hardcoded and the user cannot customize them without manually modifying the classes. This is particularly relevant for the benchmarking framework.

Hyperparameters

For instance, in pyaptamer.aptanet._feature_classifier.py you cannot customize n_estimators and depth for the RandomForestClassifier, or the optimizer and device in NeuralNetBinaryClassifier. You also cannot optimize the weight_decay of such optimizer.

The ones related to random forest and the optimizer are very important hyperparameters which should be selected depending on the task at hand. The device should be also a parameter because you may want the model to go on a specific GPU on multi-GPU clusters (currently it defaults to cuda:0. Weight decay is crucial when you need to prevent your model from overfitting.

I would suggest to make these customizable parameter in the class.

Sklearn pipeline

Currently, I find the _pipeline.py and _feature_classifier.py a bit confusing. In particular, sklearn compatible components are being combined with Pipeline in both classes. This also produces duplicate code in AptaNetRegressor and AptaNetClassifier (e.g., the random forest and selectfrommodel are common to both).

Wouldn't it be better to move these common components to the pipeline and make the classes in AptaNetRegressor slimmer? I think it also makes easier to understand how the model is being built because all components would be put in a Pipeline instance in the pipeline class, rather than having some components being combined in the pipeline and other components in the classifier/regressor.

FYI @fkiraly @satvshr

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmediumissues which require multiple file changes, but doesn't need complete code base understanding.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions