Skip to content

[BUG] Mixture._sample handles indep_cols and indep_rows backward, destroying independence assumptions #974

@ANANYA542

Description

@ANANYA542

Describe the bug
In skpro.distributions.mixture.Mixture, the arguments indep_rows and indep_cols are intended to control whether rows and columns are sampled independently from mixture components (defaulting to True).
However, in the _sample method (around line 224), the conditional logic handling these flags is inverted:

if indep_rows:
    rd_size[0] = 1
if indep_cols:
    rd_size[1] = 1

When indep_rows=True, this sets the sampling size along the row dimension to 1. This forces np.random.choice to draw a single component index, which is then broadcast across all rows — making them fully dependent.
Conversely, when indep_rows=False, the full dimension is retained, allowing independent sampling across rows when dependence was intended.

To Reproduce

**To Reproduce**
```python
import numpy as np
import pandas as pd
from skpro.distributions.normal import Normal
from skpro.distributions.mixture import Mixture

mu1 = np.full((N_ROWS, 2), 100.0)
mu2 = np.full((N_ROWS, 2), -100.0)

n1 = Normal(mu=mu1, sigma=0.01)
n2 = Normal(mu=mu2, sigma=0.01)

# TEST 1: Request Independent Rows, Dependent Cols
mixture_1 = Mixture(
    distributions=[n1, n2], weights=[0.5, 0.5], indep_rows=True, indep_cols=False
)
np.random.seed(42)
sample_1 = np.round(mixture_1.sample(1).values)
unique_rows_1 = len(np.unique(sample_1, axis=0))

print("--- TEST 1: indep_rows=True, indep_cols=False ---")
print(f"EXPECTED: ~20 unique rows | ACTUAL: {unique_rows_1}")
print(sample_1[:5])

# TEST 2: Request Dependent Rows, Independent Cols
mixture_2 = Mixture(
    distributions=[n1, n2], weights=[0.5, 0.5], indep_rows=False, indep_cols=True
)
np.random.seed(13)
sample_2 = np.round(mixture_2.sample(1).values)
unique_rows_2 = len(np.unique(sample_2, axis=0))

print("\n--- TEST 2: indep_rows=False, indep_cols=True ---")
print(f"EXPECTED: 1 unique row | ACTUAL: {unique_rows_2}")
print(sample_2[:5])

Observed behavior
SKPRO OUTPUT (indep_rows=True):
[[ 100. -100.]
[ 100. -100.]
[ 100. -100.]
[ 100. -100.]
[ 100. -100.]]

SKPRO OUTPUT (indep_rows=False):
[[-100. -100.]
[ 100. 100.]
[-100. -100.]
[-100. -100.]
[-100. -100.]]

indep_rows=True → rows are identical (dependent)
indep_rows=False → rows are independent
This indicates the independence logic is reversed.
A screenshot showing the same behavior has been attached below for reference.

Image

Expected behavior

  1. When indep_rows=True, rows should sample mixture components independently
  2. When indep_rows=False, rows should share a single component (broadcast behavior)

Environment

  • OS: macOS
  • Python: 3.x
  • skpro: latest (main branch)
  • NumPy: standard version

Additional context
Proposed solution
Invert the condition in _sample:

if not indep_rows:
    rd_size[0] = 1
if not indep_cols:
    rd_size[1] = 1

This ensures: independence → full sampling dimension ; dependence → broadcasted single component

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions