Describe the bug
AptaNetClassifier.fit() and AptaNetRegressor.fit() call np.random.seed(self.random_state) when random_state is set. This mutates the global NumPy random state, which is an anti-pattern that causes unintended side effects on any user code using numpy.random in the same process.
The scikit-learn convention is to never set the global seed instead, RandomState instances should be passed to individual components. In this case, the RandomForestClassifier/RandomForestRegressor already receives random_state=self.random_state via _build_pipeline(), so the global np.random.seed() call is redundant for pipeline reproducibility.
To Reproduce
import numpy as np
from pyaptamer.aptanet import AptaNetClassifier
# User generates some random data
np.random.rand(5)
state_before = np.random.get_state()[1][:3].copy()
# Fitting the classifier resets global NumPy state!
clf = AptaNetClassifier(random_state=42, max_epochs=1, verbose=0)
X = np.random.rand(20, 5).astype(np.float32)
y = np.array([0]*10 + [1]*10, dtype=np.float32)
clf.fit(X, y)
# Global state was silently reset to seed=42
np.random.seed(42)
state_seed42 = np.random.get_state()[1][:3].copy()
state_after = np.random.get_state()[1][:3].copy()
# Before fix: state_after == state_seed42 (UNEXPECTED!)
Expected behavior
Calling fit() should not mutate the global NumPy random state. The random_state parameter should only affect reproducibility within the estimator itself, as per scikit-learn conventions.
Additional context
- The
RandomForest estimator already receives random_state via _build_pipeline(), making the np.random.seed() call redundant for that component.
torch.manual_seed() is kept because PyTorch has no local seed alternative this is standard practice in PyTorch-based sklearn estimators.
- The bug exists in both
AptaNetClassifier.fit() (line 122) and AptaNetRegressor.fit() (line 306).
Versions
Details
Describe the bug
AptaNetClassifier.fit()andAptaNetRegressor.fit()callnp.random.seed(self.random_state)whenrandom_stateis set. This mutates the global NumPy random state, which is an anti-pattern that causes unintended side effects on any user code usingnumpy.randomin the same process.The scikit-learn convention is to never set the global seed instead,
RandomStateinstances should be passed to individual components. In this case, theRandomForestClassifier/RandomForestRegressoralready receivesrandom_state=self.random_statevia_build_pipeline(), so the globalnp.random.seed()call is redundant for pipeline reproducibility.To Reproduce
Expected behavior
Calling
fit()should not mutate the global NumPy random state. Therandom_stateparameter should only affect reproducibility within the estimator itself, as per scikit-learn conventions.Additional context
RandomForestestimator already receivesrandom_statevia_build_pipeline(), making thenp.random.seed()call redundant for that component.torch.manual_seed()is kept because PyTorch has no local seed alternative this is standard practice in PyTorch-based sklearn estimators.AptaNetClassifier.fit()(line 122) andAptaNetRegressor.fit()(line 306).Versions
Details