Skip to content

Adding Principal Covariates Classification (PCovC) Code #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ The rules for CHANGELOG file:

0.3.0 (XXXX/XX/XX)
------------------
- Add ``_BasePCov`` class (#248)
- Add ``PCovC`` class that inherits shared functionality from ``_BasePCov`` (#248)
- Add ``PCovC`` testing suite and examples (#248)
- Modify ``PCovR`` to inherit shared functionality from ``_BasePCov_`` (#248)
- Update to sklearn >= 1.6.0 and scipy >= 1.15.0 (#239)
- Fixed moved function import from scipy and bump scipy dependency to 1.15.0 (#236)
- Fix rendering issues for `SparseKDE` and `QuickShift` (#236)
Expand Down
6 changes: 6 additions & 0 deletions docs/src/bibliography.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,9 @@ References
Michele Ceriotti, "Improving Sample and Feature Selection with Principal Covariates
Regression" 2021 Mach. Learn.: Sci. Technol. 2 035038.
https://iopscience.iop.org/article/10.1088/2632-2153/abfe7c.

.. [Jorgensen2025]
Christian Jorgensen, Arthur Y. Lin, Rhushil Vasavada, and Rose K. Cersonsky,
"Interpretable Visualizations of Data Spaces for Classification Problems"
2025 arXiv. 2503.05861.
https://doi.org/10.48550/arXiv.2503.05861.
9 changes: 8 additions & 1 deletion docs/src/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,14 @@
"sphinx_toggleprompt",
]

example_subdirs = ["pcovr", "selection", "regression", "reconstruction", "neighbors"]
example_subdirs = [
"pcovr",
"pcovc",
"selection",
"regression",
"reconstruction",
"neighbors",
]
sphinx_gallery_conf = {
"filename_pattern": "/*",
"examples_dirs": [f"../../examples/{p}" for p in example_subdirs],
Expand Down
8 changes: 5 additions & 3 deletions docs/src/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ Notebook Examples
.. include:: examples/reconstruction/index.rst
:start-line: 4

.. _getting_started-pcovr:
.. _getting_started-hybrid:

Principal Covariates Regression
-------------------------------
Hybrid Mapping Techniques
-------------------------

.. automodule:: skmatter.decomposition
:noindex:
Expand All @@ -50,3 +50,5 @@ Notebook Examples

.. include:: examples/pcovr/index.rst
:start-line: 4
.. include:: examples/pcovc/index.rst
:start-line: 4
4 changes: 2 additions & 2 deletions docs/src/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@

.. only:: html

:ref:`getting_started-pcovr`
:ref:`getting_started-hybrid`

.. image:: /examples/pcovr/images/thumb/sphx_glr_PCovR_thumb.png
:alt:

.. raw:: html

</h5>
<p class="card-text">Utilises a combination between a PCA-like and a LR-like loss
<p class="card-text">PCovR and PCovC utilize a combination between a PCA-like and a LR-like loss
to determine the decomposition matrix to project feature into latent space</p>
</div>
</div>
Expand Down
24 changes: 22 additions & 2 deletions docs/src/references/decomposition.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Principal Covariates Regression (PCovR)
=======================================
Hybrid Mapping Techniques
=========================

.. _PCovR-api:

Expand All @@ -20,6 +20,26 @@ PCovR
.. automethod:: inverse_transform
.. automethod:: score

.. _PCovC-api:

PCovC
-----

.. autoclass:: skmatter.decomposition.PCovC
:show-inheritance:
:special-members:

.. automethod:: fit

.. automethod:: _fit_feature_space
.. automethod:: _fit_sample_space

.. automethod:: transform
.. automethod:: predict
.. automethod:: inverse_transform
.. automethod:: decision_function
.. automethod:: score

.. _KPCovR-api:

Kernel PCovR
Expand Down
1 change: 1 addition & 0 deletions docs/src/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
.. toctree::

examples/pcovr/index
examples/pcovc/index
examples/selection/index
examples/regression/index
examples/reconstruction/index
Expand Down
110 changes: 110 additions & 0 deletions examples/pcovc/PCovC_Comparison.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
#!/usr/bin/env python
# coding: utf-8

"""
Comparing PCovC with PCA and LDA
================================
"""
# %%
#

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegressionCV
from sklearn.preprocessing import StandardScaler

from skmatter.decomposition import PCovC


plt.rcParams["image.cmap"] = "tab10"
plt.rcParams["scatter.edgecolors"] = "k"

random_state = 0

# %%
#
# For this, we will use the :func:`sklearn.datasets.load_breast_cancer` dataset from
# ``sklearn``.

X, y = load_breast_cancer(return_X_y=True)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# %%
#
# PCA
# ---
#

pca = PCA(n_components=2)

pca.fit(X_scaled, y)
T_pca = pca.transform(X_scaled)

fig, ax = plt.subplots()
scatter = ax.scatter(T_pca[:, 0], T_pca[:, 1], c=y)
ax.set(xlabel="PC$_1$", ylabel="PC$_2$")
ax.legend(
scatter.legend_elements()[0][::-1],
load_breast_cancer().target_names[::-1],
loc="upper right",
title="Classes",
)

# %%
#
# LDA
# ---
#

lda = LinearDiscriminantAnalysis(n_components=1)
lda.fit(X_scaled, y)

T_lda = lda.transform(X_scaled)

fig, ax = plt.subplots()
ax.scatter(T_lda[:], np.zeros(len(T_lda[:])), c=y)
ax.set(xlabel="LDA$_1$", ylabel="LDA$_2$")

# %%
#
# PCovC
# -------------------
#
# Below, we see the map produced
# by a PCovC model with :math:`\alpha` = 0.5 and a logistic
# regression classifier.

mixing = 0.5

pcovc = PCovC(
mixing=mixing,
n_components=2,
random_state=random_state,
classifier=LogisticRegressionCV(),
)
pcovc.fit(X_scaled, y)

T_pcovc = pcovc.transform(X_scaled)

fig, ax = plt.subplots()
ax.scatter(T_pcovc[:, 0], T_pcovc[:, 1], c=y)
ax.set(xlabel="PCov$_1$", ylabel="PCov$_2$")

# %%
#
# A side-by-side comparison of the
# three maps (PCA, LDA, and PCovC):

fig, axs = plt.subplots(1, 3, figsize=(18, 5))
axs[0].scatter(T_pca[:, 0], T_pca[:, 1], c=y)
axs[0].set_title("PCA")
axs[1].scatter(T_lda, np.zeros(len(T_lda)), c=y)
axs[1].set_title("LDA")
axs[2].scatter(T_pcovc[:, 0], T_pcovc[:, 1], c=y)
axs[2].set_title("PCovC")
plt.show()
158 changes: 158 additions & 0 deletions examples/pcovc/PCovC_Hyperparameters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#!/usr/bin/env python
# coding: utf-8

"""
PCovC Hyperparameter Tuning
===========================
"""
# %%
#

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.linear_model import LogisticRegressionCV, Perceptron, RidgeClassifierCV
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

from skmatter.decomposition import PCovC


plt.rcParams["image.cmap"] = "tab10"
plt.rcParams["scatter.edgecolors"] = "k"

random_state = 10
n_components = 2

# %%
#
# For this, we will use the :func:`sklearn.datasets.load_iris` dataset from
# ``sklearn``.

X, y = load_iris(return_X_y=True)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# %%
#
# PCA
# ---
#

pca = PCA(n_components=n_components)

pca.fit(X_scaled, y)
T_pca = pca.transform(X_scaled)

fig, axis = plt.subplots()
scatter = axis.scatter(T_pca[:, 0], T_pca[:, 1], c=y)
axis.set(xlabel="PC$_1$", ylabel="PC$_2$")
axis.legend(
scatter.legend_elements()[0],
load_iris().target_names,
loc="lower right",
title="Classes",
)

# %%
#
# Effect of Mixing Parameter :math:`\alpha` on PCovC Map
# ------------------------------------------------------
#
# Below, we see how different :math:`\alpha` values for our PCovC model
# result in varying class distinctions between setosa, versicolor,
# and virginica on the PCovC map.

n_mixing = 5
mixing_params = [0, 0.25, 0.50, 0.75, 1]

fig, axs = plt.subplots(1, n_mixing, figsize=(4 * n_mixing, 4), sharey="row")

for id in range(0, n_mixing):
mixing = mixing_params[id]

pcovc = PCovC(
mixing=mixing,
n_components=n_components,
random_state=random_state,
classifier=LogisticRegressionCV(),
)

pcovc.fit(X_scaled, y)
T = pcovc.transform(X_scaled)

axs[id].set_xticks([])
axs[id].set_yticks([])

axs[id].set_title(r"$\alpha=$" + str(mixing))
axs[id].set_xlabel("PCov$_1$")
axs[id].scatter(T[:, 0], T[:, 1], c=y)

axs[0].set_ylabel("PCov$_2$")

fig.subplots_adjust(wspace=0)

# %%
#
# Effect of PCovC Classifier on PCovC Map and Decision Boundaries
# ---------------------------------------------------------------
#
# Here, we see how a PCovC model (:math:`\alpha` = 0.5) fitted with
# different classifiers produces varying PCovC maps. In addition,
# we see the varying decision boundaries produced by the
# respective PCovC classifiers.

mixing = 0.5
fig, axs = plt.subplots(1, 4, figsize=(16, 4))

models = {
RidgeClassifierCV(): "Ridge Classification",
LogisticRegressionCV(random_state=random_state): "Logistic Regression",
LinearSVC(random_state=random_state): "Support Vector Classification",
Perceptron(random_state=random_state): "Single-Layer Perceptron",
}

for id in range(0, len(models)):
model = list(models)[id]

pcovc = PCovC(
mixing=mixing,
n_components=n_components,
random_state=random_state,
classifier=model,
)

pcovc.fit(X_scaled, y)
T = pcovc.transform(X_scaled)

graph = axs[id]
graph.set_title(models[model])

DecisionBoundaryDisplay.from_estimator(
estimator=pcovc.classifier_,
X=T,
ax=graph,
response_method="predict",
grid_resolution=1000,
)

scatter = graph.scatter(T[:, 0], T[:, 1], c=y)

graph.set_xlabel("PCov$_1$")
graph.set_xticks([])
graph.set_yticks([])

axs[0].set_ylabel("PCov$_2$")
axs[0].legend(
scatter.legend_elements()[0],
load_iris().target_names,
loc="lower right",
title="Classes",
fontsize=8,
)

fig.subplots_adjust(wspace=0.04)
plt.show()
2 changes: 2 additions & 0 deletions examples/pcovc/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
PCovC
=====
Loading