LLM generated content, by Claude Opus 4.8
Summary
AptaNetPipeline now consumes a MoleculeLoader only (via the MoleculeLoader-only PairsToFeatures), but MoleculeLoader is not sliceable, so it cannot be used with scikit-learn cross-validation / grid-search. This breaks Benchmarking with AptaNet estimators.
Details
Benchmarking.run() calls sklearn.model_selection.cross_validate(estimator, X, y, cv=...), which slices X per fold via _safe_indexing. Two problems:
- Passing a list of pairs (as
test_benchmarking.py does) is sliceable, but the list is then rejected by PairsToFeatures (TypeError: PairsToFeatures accepts only a MoleculeLoader as input, got list.).
- Passing a
MoleculeLoader is accepted by PairsToFeatures, but MoleculeLoader has no __len__/__getitem__, so _safe_indexing fails ('MoleculeLoader' object is not subscriptable).
Reproduce:
import numpy as np
from sklearn.utils import _safe_indexing
from pyaptamer.data import MoleculeLoader
ml = MoleculeLoader(data={"aptamer": ["ACGU"] * 40, "protein": ["MK"] * 40})
_safe_indexing(ml, np.arange(10)) # TypeError: not subscriptable
Proposed fix
Make MoleculeLoader sklearn-sliceable: add __len__ (number of materialized samples) and __getitem__ (integer / array / slice → a sub-MoleculeLoader over the selected rows), so it survives cross_validate and returns a loader each fold. Then migrate test_benchmarking.py to pass a MoleculeLoader and re-enable the skipped tests.
Affected tests (currently skipped)
pyaptamer/benchmarking/tests/test_benchmarking.py::test_benchmarking_with_predefined_split_classification
pyaptamer/benchmarking/tests/test_benchmarking.py::test_benchmarking_with_predefined_split_regression
Skipped in the PR that lands the pipeline migration so main stays green; this issue tracks the real fix.
LLM generated content, by Claude Opus 4.8
Summary
AptaNetPipelinenow consumes aMoleculeLoaderonly (via the MoleculeLoader-onlyPairsToFeatures), butMoleculeLoaderis not sliceable, so it cannot be used with scikit-learn cross-validation / grid-search. This breaksBenchmarkingwith AptaNet estimators.Details
Benchmarking.run()callssklearn.model_selection.cross_validate(estimator, X, y, cv=...), which slicesXper fold via_safe_indexing. Two problems:test_benchmarking.pydoes) is sliceable, but the list is then rejected byPairsToFeatures(TypeError: PairsToFeatures accepts only a MoleculeLoader as input, got list.).MoleculeLoaderis accepted byPairsToFeatures, butMoleculeLoaderhas no__len__/__getitem__, so_safe_indexingfails ('MoleculeLoader' object is not subscriptable).Reproduce:
Proposed fix
Make
MoleculeLoadersklearn-sliceable: add__len__(number of materialized samples) and__getitem__(integer / array / slice → a sub-MoleculeLoaderover the selected rows), so it survivescross_validateand returns a loader each fold. Then migratetest_benchmarking.pyto pass aMoleculeLoaderand re-enable the skipped tests.Affected tests (currently skipped)
pyaptamer/benchmarking/tests/test_benchmarking.py::test_benchmarking_with_predefined_split_classificationpyaptamer/benchmarking/tests/test_benchmarking.py::test_benchmarking_with_predefined_split_regressionSkipped in the PR that lands the pipeline migration so
mainstays green; this issue tracks the real fix.