-
Notifications
You must be signed in to change notification settings - Fork 48
Add CTEImputer #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add CTEImputer #480
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces the CTEImputer (Compress Then Explain imputer) as a new imputation strategy for the shapiq package, following the methodology from Baniecki et al. (2025). The imputer uses distribution compression (Compress++ with Kernel Thinning) to subsample background data before imputing missing features, providing accurate and stable explanations while being computationally efficient.
Key Changes:
- Adds new
CTEImputerclass implementing the compress-then-explain methodology with background data compression - Makes
CTEImputerthe new default imputer inTabularExplainer, replacingMarginalImputer - Adds
goodpointslibrary as a new dependency for distribution compression functionality
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/shapiq/imputer/cte_imputer.py | New implementation of CTEImputer class with distribution compression for efficient feature imputation |
| src/shapiq/imputer/init.py | Exports CTEImputer to make it available in the imputer module |
| src/shapiq/explainer/tabular.py | Updates TabularExplainer to use "cte" as the default imputer instead of "marginal" |
| pyproject.toml | Adds goodpoints dependency and shapiq keyword to project metadata |
| CHANGELOG.md | Documents the addition of CTEImputer feature for the development version |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| id_compressed = compress.compresspp_kt(data, kernel_type=b"gaussian", k_params=np.array([sigma**2]), g=4, seed=self.random_state) | ||
| self._replacement_data = data[id_compressed] | ||
| self.calc_empty_prediction() # reset the empty prediction to the new background data | ||
| return self |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a trailing space at the end of this line that should be removed to maintain code cleanliness.
| return self | |
| return self |
| """ | ||
| d = data.shape[1] | ||
| sigma = np.sqrt(2 * d) | ||
| id_compressed = compress.compresspp_kt(data, kernel_type=b"gaussian", k_params=np.array([sigma**2]), g=4, seed=self.random_state) |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The seed parameter passed to compresspp_kt may not properly handle the case when self.random_state is None. The function expects an integer seed, but random_state can be None according to the __init__ signature. This could lead to unexpected behavior or errors when random_state=None.
| id_compressed = compress.compresspp_kt(data, kernel_type=b"gaussian", k_params=np.array([sigma**2]), g=4, seed=self.random_state) | |
| if self.random_state is None: | |
| id_compressed = compress.compresspp_kt( | |
| data, | |
| kernel_type=b"gaussian", | |
| k_params=np.array([sigma**2]), | |
| g=4, | |
| ) | |
| else: | |
| id_compressed = compress.compresspp_kt( | |
| data, | |
| kernel_type=b"gaussian", | |
| k_params=np.array([sigma**2]), | |
| g=4, | |
| seed=self.random_state, | |
| ) |
| ``["cte", "marginal", "baseline", "conditional"]``. Defaults to ``"cte"``, which | ||
| initializes the default | ||
| :class:`~shapiq.games.imputer.marginal_imputer.MarginalImputer` with its default | ||
| :class:`~shapiq.games.imputer.marginal_imputer.CTEImputer` with its default |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation incorrectly references the wrong class name. It should say "CTEImputer" instead of "marginal_imputer.CTEImputer". The CTEImputer is defined in the cte_imputer module, not the marginal_imputer module.
| :class:`~shapiq.games.imputer.marginal_imputer.CTEImputer` with its default | |
| :class:`~shapiq.games.imputer.cte_imputer.CTEImputer` with its default |
| imputed_data = np.tile(self.x, (n_coalitions, 1)) | ||
| for i in range(sample_size): |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imputed_data array is initialized outside the loop but modified inside the loop, which means that after the first iteration, it will contain modifications from the previous iteration. This should be moved inside the loop to ensure each iteration starts with a fresh copy of self.x tiled for all coalitions, otherwise the results will be incorrect for iterations after the first one.
| imputed_data = np.tile(self.x, (n_coalitions, 1)) | |
| for i in range(sample_size): | |
| for i in range(sample_size): | |
| imputed_data = np.tile(self.x, (n_coalitions, 1)) |
| @@ -0,0 +1,164 @@ | |||
| """Implementation of the marginal imputer.""" | |||
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module docstring incorrectly states "Implementation of the marginal imputer" when it should say "Implementation of the CTE imputer" or "Implementation of the compress then explain imputer".
| normalize: bool = True, | ||
| random_state: int | None = None, | ||
| ) -> None: | ||
| """Initializes the marginal imputer. |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring incorrectly states "Initializes the marginal imputer" when it should say "Initializes the CTE imputer" or "Initializes the compress then explain imputer".
| Examples: | ||
| >>> model = lambda x: np.sum(x, axis=1) | ||
| >>> data = np.random.rand(10, 3) | ||
| >>> imputer = MarginalImputer(model=model, data=data, x=data[0]) |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example in the docstring incorrectly uses "MarginalImputer" instead of "CTEImputer". This should be updated to match the class being documented.
Introducing CTEImputer
closes #225
Adds the
CTEImputerfollowing the compress then explain (CTE) methodology. It replaces missing features of the explanation point by values sampled from the background data, which is first subsampled using a distribution compression algorithm, specifically Compress++ with Kernel Thinning. CTE has shown to provide accurate and stable estimates of explanations while being computationally efficient. It is a new default imputer inTabularExplainer, removing the necessity to setsample_size.TODO
C++packages