Skip to content

[BUG] GroupVariableImportanceMixin should not use sorted(self.features_groups.keys()) #531

@jpaillard

Description

@jpaillard

A bug was introduced in the class: GroupVariableImportanceMixin.
The private attribute _features_groups_ids is generated by iterating through: sorted(self.features_groups.keys()). This creates a discrepancy between the order of groups in features_groups (which is the order expected by the user) and _features_groups_ids , which is used to create the $X^{-j}, X^j$ sets and then the importance array. This is illustrated below; the user expects 'dummy group' to be the second group, but its importance will actually be located at the first index of the importance array.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

from hidimstat import CFI

X = pd.DataFrame(np.random.rand(100, 3), columns=["a", "b", "c"])
y = np.random.randn(100)

features_groups = {
    "group 1": ["a", "b"],
    "dummy group": ["c"],
}

estimator = LinearRegression()
estimator.fit(X, y)
cfi = CFI(estimator=estimator, features_groups=features_groups)
cfi.fit(X, y)
print(cfi.features_groups)
print(cfi._features_groups_ids) 

out:

>>> features_groups: {'group 1': ['a', 'b'], 'dummy group': ['c']}
>>> _features_groups_ids [[2], [0, 1]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions