Skip to content

KeyError when updating a metric with CluStream (cluster module) #1182

Open
@qetdr

Description

@qetdr

Versions

river version: 0.15.0
Python version: 3.10.4
Operating system: macOS Ventura 13.2

Describe the bug

I tried to run a river's clusterer (CluStream, specifically) and update the metrics with each iteration. However, I received a key error. A fully-reproducible example is below.

Steps/code to reproduce

import pandas as pd
from river.cluster import CluStream
from river import stream
from river.metrics import Silhouette

# Import the data
s1 = pd.read_table('http://cs.uef.fi/sipu/datasets/s1.txt', 
                   sep = "\s+", 
                   names = ['x1', 'x2']).sample(5000, random_state = 42).reset_index(drop = True)

# Taking a random sample for a smaller batch of the data
n_samples = 500
df_first_batch = s1.sample(n_samples).reset_index(drop = True)

clusterer = CluStream(time_window=1,
                      max_micro_clusters=30,
                      n_macro_clusters=15,                      
                      seed=0,
                      halflife=0.4
                     )
metric = Silhouette()

for x, _ in stream.iter_pandas(df_first_batch):
    clusterer = clusterer.learn_one(x)
    y_pred = clusterer.predict_one(x)
    metric = metric.update(x = x, 
                           y_pred = y_pred, 
                           centers = clusterer.centers)

Here's the output:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[2], line 26
     24 clusterer = clusterer.learn_one(x)
     25 y_pred = clusterer.predict_one(x)
---> 26 metric = metric.update(x = x, 
     27                        y_pred = y_pred, 
     28                        centers = clusterer.centers)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river/metrics/silhouette.py:71, in Silhouette.update(self, x, y_pred, centers, sample_weight)
     69 def update(self, x, y_pred, centers, sample_weight=1.0):
---> 71     distance_closest_centroid = math.sqrt(utils.math.minkowski_distance(centers[y_pred], x, 2))
     72     self._sum_distance_closest_centroid += distance_closest_centroid
     74     distance_second_closest_centroid = self._find_distance_second_closest_center(centers, x)

KeyError: 0

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions