Add CTEImputer #480

hbaniecki · 2025-11-26T13:02:05Z

Introducing CTEImputer

closes #225

Adds the CTEImputer following the compress then explain (CTE) methodology. It replaces missing features of the explanation point by values sampled from the background data, which is first subsampled using a distribution compression algorithm, specifically Compress++ with Kernel Thinning. CTE has shown to provide accurate and stable estimates of explanations while being computationally efficient. It is a new default imputer in TabularExplainer, removing the necessity to set sample_size.

TODO

add tests
fix Windows CI to include some missing C++ packages

Copilot

Pull request overview

This PR introduces the CTEImputer (Compress Then Explain imputer) as a new imputation strategy for the shapiq package, following the methodology from Baniecki et al. (2025). The imputer uses distribution compression (Compress++ with Kernel Thinning) to subsample background data before imputing missing features, providing accurate and stable explanations while being computationally efficient.

Key Changes:

Adds new CTEImputer class implementing the compress-then-explain methodology with background data compression
Makes CTEImputer the new default imputer in TabularExplainer, replacing MarginalImputer
Adds goodpoints library as a new dependency for distribution compression functionality

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/shapiq/imputer/cte_imputer.py	New implementation of CTEImputer class with distribution compression for efficient feature imputation
src/shapiq/imputer/init.py	Exports CTEImputer to make it available in the imputer module
src/shapiq/explainer/tabular.py	Updates TabularExplainer to use "cte" as the default imputer instead of "marginal"
pyproject.toml	Adds goodpoints dependency and shapiq keyword to project metadata
CHANGELOG.md	Documents the addition of CTEImputer feature for the development version

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-28T15:33:25Z

src/shapiq/imputer/cte_imputer.py

+        id_compressed = compress.compresspp_kt(data, kernel_type=b"gaussian", k_params=np.array([sigma**2]), g=4, seed=self.random_state)
+        self._replacement_data = data[id_compressed]
+        self.calc_empty_prediction()  # reset the empty prediction to the new background data
+        return self 


There's a trailing space at the end of this line that should be removed to maintain code cleanliness.

Suggested change

return self

return self

Copilot · 2025-12-28T15:33:25Z

src/shapiq/imputer/cte_imputer.py

+        """
+        d = data.shape[1]
+        sigma = np.sqrt(2 * d)
+        id_compressed = compress.compresspp_kt(data, kernel_type=b"gaussian", k_params=np.array([sigma**2]), g=4, seed=self.random_state)


The seed parameter passed to compresspp_kt may not properly handle the case when self.random_state is None. The function expects an integer seed, but random_state can be None according to the __init__ signature. This could lead to unexpected behavior or errors when random_state=None.

Suggested change

id_compressed = compress.compresspp_kt(data, kernel_type=b"gaussian", k_params=np.array([sigma**2]), g=4, seed=self.random_state)

if self.random_state is None:

id_compressed = compress.compresspp_kt(

data,

kernel_type=b"gaussian",

k_params=np.array([sigma**2]),

g=4,

)

else:

id_compressed = compress.compresspp_kt(

data,

kernel_type=b"gaussian",

k_params=np.array([sigma**2]),

g=4,

seed=self.random_state,

)

Copilot · 2025-12-28T15:33:26Z

src/shapiq/explainer/tabular.py

+                ``["cte", "marginal", "baseline", "conditional"]``. Defaults to ``"cte"``, which
                initializes the default
-                :class:`~shapiq.games.imputer.marginal_imputer.MarginalImputer` with its default
+                :class:`~shapiq.games.imputer.marginal_imputer.CTEImputer` with its default


The documentation incorrectly references the wrong class name. It should say "CTEImputer" instead of "marginal_imputer.CTEImputer". The CTEImputer is defined in the cte_imputer module, not the marginal_imputer module.

Suggested change

:class:`~shapiq.games.imputer.marginal_imputer.CTEImputer` with its default

:class:`~shapiq.games.imputer.cte_imputer.CTEImputer` with its default

Copilot · 2025-12-28T15:33:26Z

src/shapiq/imputer/cte_imputer.py

+        imputed_data = np.tile(self.x, (n_coalitions, 1))
+        for i in range(sample_size):


The imputed_data array is initialized outside the loop but modified inside the loop, which means that after the first iteration, it will contain modifications from the previous iteration. This should be moved inside the loop to ensure each iteration starts with a fresh copy of self.x tiled for all coalitions, otherwise the results will be incorrect for iterations after the first one.

Suggested change

imputed_data = np.tile(self.x, (n_coalitions, 1))

for i in range(sample_size):

for i in range(sample_size):

imputed_data = np.tile(self.x, (n_coalitions, 1))

Copilot · 2025-12-28T15:33:26Z

src/shapiq/imputer/cte_imputer.py

@@ -0,0 +1,164 @@
+"""Implementation of the marginal imputer."""


The module docstring incorrectly states "Implementation of the marginal imputer" when it should say "Implementation of the CTE imputer" or "Implementation of the compress then explain imputer".

Copilot · 2025-12-28T15:33:27Z

src/shapiq/imputer/cte_imputer.py

+        normalize: bool = True,
+        random_state: int | None = None,
+    ) -> None:
+        """Initializes the marginal imputer.


The docstring incorrectly states "Initializes the marginal imputer" when it should say "Initializes the CTE imputer" or "Initializes the compress then explain imputer".

Copilot · 2025-12-28T15:33:27Z

src/shapiq/imputer/cte_imputer.py

+        Examples:
+            >>> model = lambda x: np.sum(x, axis=1)
+            >>> data = np.random.rand(10, 3)
+            >>> imputer = MarginalImputer(model=model, data=data, x=data[0])


The example in the docstring incorrectly uses "MarginalImputer" instead of "CTEImputer". This should be updated to match the class being documented.

add cte imputer

182da49

github-project-automation bot added this to shapiq development Nov 26, 2025

fix missing imports

ed6acb3

hbaniecki requested a review from mmschlk November 26, 2025 13:17

mmschlk requested review from Copilot and removed request for mmschlk December 28, 2025 15:28

Copilot started reviewing on behalf of mmschlk December 28, 2025 15:29 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CTEImputer #480

Add CTEImputer #480

Uh oh!

hbaniecki commented Nov 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	:class:`~shapiq.games.imputer.marginal_imputer.CTEImputer` with its default
	:class:`~shapiq.games.imputer.cte_imputer.CTEImputer` with its default

		imputed_data = np.tile(self.x, (n_coalitions, 1))
		for i in range(sample_size):

		@@ -0,0 +1,164 @@
		"""Implementation of the marginal imputer."""

Add CTEImputer #480

Are you sure you want to change the base?

Add CTEImputer #480

Uh oh!

Conversation

hbaniecki commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introducing CTEImputer

TODO

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hbaniecki commented Nov 26, 2025 •

edited

Loading