While investigating #144 I realized that AptaNet has a few issues when you want to customize it's hyperparameters: some hyperparameters are hardcoded and the user cannot customize them without manually modifying the classes. This is particularly relevant for the benchmarking framework.
Hyperparameters
For instance, in pyaptamer.aptanet._feature_classifier.py you cannot customize n_estimators and depth for the RandomForestClassifier, or the optimizer and device in NeuralNetBinaryClassifier. You also cannot optimize the weight_decay of such optimizer.
The ones related to random forest and the optimizer are very important hyperparameters which should be selected depending on the task at hand. The device should be also a parameter because you may want the model to go on a specific GPU on multi-GPU clusters (currently it defaults to cuda:0. Weight decay is crucial when you need to prevent your model from overfitting.
I would suggest to make these customizable parameter in the class.
Sklearn pipeline
Currently, I find the _pipeline.py and _feature_classifier.py a bit confusing. In particular, sklearn compatible components are being combined with Pipeline in both classes. This also produces duplicate code in AptaNetRegressor and AptaNetClassifier (e.g., the random forest and selectfrommodel are common to both).
Wouldn't it be better to move these common components to the pipeline and make the classes in AptaNetRegressor slimmer? I think it also makes easier to understand how the model is being built because all components would be put in a Pipeline instance in the pipeline class, rather than having some components being combined in the pipeline and other components in the classifier/regressor.
FYI @fkiraly @satvshr
While investigating #144 I realized that AptaNet has a few issues when you want to customize it's hyperparameters: some hyperparameters are hardcoded and the user cannot customize them without manually modifying the classes. This is particularly relevant for the benchmarking framework.
Hyperparameters
For instance, in
pyaptamer.aptanet._feature_classifier.pyyou cannot customizen_estimatorsanddepthfor theRandomForestClassifier, or theoptimizeranddeviceinNeuralNetBinaryClassifier. You also cannot optimize theweight_decayof such optimizer.The ones related to random forest and the optimizer are very important hyperparameters which should be selected depending on the task at hand. The device should be also a parameter because you may want the model to go on a specific GPU on multi-GPU clusters (currently it defaults to
cuda:0. Weight decay is crucial when you need to prevent your model from overfitting.I would suggest to make these customizable parameter in the class.
Sklearn pipeline
Currently, I find the
_pipeline.pyand_feature_classifier.pya bit confusing. In particular, sklearn compatible components are being combined withPipelinein both classes. This also produces duplicate code inAptaNetRegressorandAptaNetClassifier(e.g., the random forest and selectfrommodel are common to both).Wouldn't it be better to move these common components to the pipeline and make the classes in
AptaNetRegressorslimmer? I think it also makes easier to understand how the model is being built because all components would be put in aPipelineinstance in the pipeline class, rather than having some components being combined in the pipeline and other components in the classifier/regressor.FYI @fkiraly @satvshr