Fix permutation indexing 1092 #1142
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @rasbt,
This PR addresses issue #1092 regarding incorrect indexing when using feature_groups in feature_importance_permutation.
Changes made:
Shuffling Logic: Instead of rng.shuffle, which failed on multi-column slices, I implemented rng.permutation(X.shape[0]) to ensure consistent row-wise shuffling across all columns within a group.
Module Export: Updated mlxtend/evaluate/init.py to correctly export feature_importance_permutation from the feature_importance.py module.
Verification (Jupyter Notebook Test):
To verify the fix, I tested the function with a group of non-informative features vs. one highly informative feature. The results confirmed that the shuffling is now working correctly for grouped indices.
Test Case:
Feature 1 is highly correlated with target. Features 2 & 3 are noise.
X = np.column_stack([feature_1, feature_2, feature_3])
feature_groups_idx = [0, [1, 2]]
mean_importance_vals, _ = feature_importance_permutation(
predict_method=model.predict, X=X, y=y, metric='r2',
num_rounds=10, feature_groups=feature_groups_idx, seed=42
)
Results:
Importance of Feature_1 (Group 0): 1.9084
Importance of Features 2&3 (Group 1): 0.0247
Status: SUCCESS (The model correctly identified the first group as the primary driver).
Note on Local Testing Issues:
During development, I encountered an issue where pytest could not locate the test files or the module in some Windows environments.
Error: file or directory not found: mlxtend/evaluate/tests/test_feature_importance.py
Reason: This appears to be a path resolution/PYTHONPATH issue on Windows and a naming inconsistency between the module (feature_importance.py) and the expected test imports.
Suggestions for Future Contributors:
Standardize Module Naming: Consider renaming feature_importance.py to feature_importance_permutation.py to match the User Guide and prevent ImportError.
Environment Setup: Always run tests using $env:PYTHONPATH = "." on Windows.
Expand Test Suite: Add a specific test case in test_feature_importance.py that handles 2D array slices to ensure no regressions in group shuffling.