Description
Describe the bug
Multiple calls to the make_classification
produce different result even with the same random_state
.
Steps/Code to reproduce bug
def test_cuml_gen() -> None:
import cupy as cp
from cuml.datasets import make_classification
n_samples_per_batch = 8192
n_features = 400
rs = n_samples_per_batch * n_features * 4
X0, y0 = make_classification(
n_samples_per_batch,
n_features,
n_redundant=0,
n_repeated=0,
n_informative=n_features,
random_state=rs,
)
X1, y1 = make_classification(
n_samples_per_batch,
n_features,
n_redundant=0,
n_repeated=0,
n_informative=n_features,
random_state=rs,
)
cp.testing.assert_allclose(X0, X1)
cp.testing.assert_allclose(y0, y1)
def inner(*args, **kwds):
with self._recreate_cm():
> return func(*args, **kwds)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E
E Mismatched elements: 1535867 / 3276800 (46.9%)
E Max absolute difference: 2.000002
E Max relative difference: 1118481.1
E x: array([[ 5.839701, -4.92806 , -18.028366, ..., -16.778223, 3.611579,
E -6.924231],
E [ 17.475462, -13.639202, -0.215977, ..., 5.726135, -4.760585,...
E y: array([[ 7.839701, -4.92806 , -18.028366, ..., -16.778223, 3.611579,
E -6.924231],
E [ 19.475462, -13.639202, -0.215977, ..., 5.726135, -4.760585,...
Expected behavior
Given the same random_state
, it should produce the same result.
Environment details (please complete the following information):
- Environment location: Bare-metal
- Linux Distro/Architecture: [Ubuntu 22.04 amd64]
- GPU Model/Driver: NVIDIA RTX A3000 Laptop GPU and driver 561.03
- CUDA: 12.6
- Method of cuDF & cuML install: conda
>>> import cupy
>>> cupy.__version__
'13.4.1'
>>> import cuml
cuml.__ver>>> cuml.__version__
'25.02.01'