-
Notifications
You must be signed in to change notification settings - Fork 179
Description
Describe the bug
In skpro.distributions.mixture.Mixture, the arguments indep_rows and indep_cols are intended to control whether rows and columns are sampled independently from mixture components (defaulting to True).
However, in the _sample method (around line 224), the conditional logic handling these flags is inverted:
if indep_rows:
rd_size[0] = 1
if indep_cols:
rd_size[1] = 1When indep_rows=True, this sets the sampling size along the row dimension to 1. This forces np.random.choice to draw a single component index, which is then broadcast across all rows — making them fully dependent.
Conversely, when indep_rows=False, the full dimension is retained, allowing independent sampling across rows when dependence was intended.
To Reproduce
**To Reproduce**
```python
import numpy as np
import pandas as pd
from skpro.distributions.normal import Normal
from skpro.distributions.mixture import Mixture
mu1 = np.full((N_ROWS, 2), 100.0)
mu2 = np.full((N_ROWS, 2), -100.0)
n1 = Normal(mu=mu1, sigma=0.01)
n2 = Normal(mu=mu2, sigma=0.01)
# TEST 1: Request Independent Rows, Dependent Cols
mixture_1 = Mixture(
distributions=[n1, n2], weights=[0.5, 0.5], indep_rows=True, indep_cols=False
)
np.random.seed(42)
sample_1 = np.round(mixture_1.sample(1).values)
unique_rows_1 = len(np.unique(sample_1, axis=0))
print("--- TEST 1: indep_rows=True, indep_cols=False ---")
print(f"EXPECTED: ~20 unique rows | ACTUAL: {unique_rows_1}")
print(sample_1[:5])
# TEST 2: Request Dependent Rows, Independent Cols
mixture_2 = Mixture(
distributions=[n1, n2], weights=[0.5, 0.5], indep_rows=False, indep_cols=True
)
np.random.seed(13)
sample_2 = np.round(mixture_2.sample(1).values)
unique_rows_2 = len(np.unique(sample_2, axis=0))
print("\n--- TEST 2: indep_rows=False, indep_cols=True ---")
print(f"EXPECTED: 1 unique row | ACTUAL: {unique_rows_2}")
print(sample_2[:5])Observed behavior
SKPRO OUTPUT (indep_rows=True):
[[ 100. -100.]
[ 100. -100.]
[ 100. -100.]
[ 100. -100.]
[ 100. -100.]]
SKPRO OUTPUT (indep_rows=False):
[[-100. -100.]
[ 100. 100.]
[-100. -100.]
[-100. -100.]
[-100. -100.]]
indep_rows=True → rows are identical (dependent)
indep_rows=False → rows are independent
This indicates the independence logic is reversed.
A screenshot showing the same behavior has been attached below for reference.
Expected behavior
- When indep_rows=True, rows should sample mixture components independently
- When indep_rows=False, rows should share a single component (broadcast behavior)
Environment
- OS: macOS
- Python: 3.x
- skpro: latest (main branch)
- NumPy: standard version
Additional context
Proposed solution
Invert the condition in _sample:
if not indep_rows:
rd_size[0] = 1
if not indep_cols:
rd_size[1] = 1This ensures: independence → full sampling dimension ; dependence → broadcasted single component