Skip to content

[Bug]: Minor bugs regarding IncrementalPCA() #2006

@LScheib

Description

@LScheib

1. There is a return error for IncrementalPCA() objects

For PCA() we can get an overview of the current variables:

>>> import heat as ht
>>> from heat.decomposition import pca
>>> pca = ht.decomposition.PCA()
>>> pca
PCA({
    "n_components": null,
    "copy": true,
    "whiten": false,
    "svd_solver": "hierarchical",
    "tol": null,
    "iterated_power": 0,
    "n_oversamples": 10,
    "power_iteration_normalizer": "qr",
    "random_state": null
})

Trying the same for IncrementalPCA() results in:

>>> import heat as ht
>>> from heat.decomposition import pca
>>> ipca = ht.decomposition.IncrementalPCA()
>>> ipca
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../heat/heat/core/base.py", line 66, in __repr__
    return f"{self.__class__.__name__}({json.dumps(self.get_params(), indent=4)})"
  File ".../heat/heat/core/base.py", line 49, in get_params
    value = getattr(self, key)
AttributeError: 'IncrementalPCA' object has no attribute 'copy'

2. Possible error in unit test test_incrementalpca_truncation_happens_split1 or base functionality

Unit test fails for 16 processes (but works for 2, 4, 8).

HEAT_TEST_USE_DEVICE=cpu mpirun -n 16 python -m unittest -vf heat/decomposition/tests/test_pca.py

======================================================================
ERROR: test_incrementalpca_truncation_happens_split1 (heat.decomposition.tests.test_pca.TestIncrementalPCA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../heat/heat/decomposition/tests/test_pca.py", line 315, in test_incrementalpca_truncation_happens_split1
    pca.partial_fit(data1)
  File ".../heat/heat/decomposition/pca.py", line 453, in partial_fit
    U, S, mean = _isvd(
  File ".../heat/heat/core/linalg/svdtools.py", line 658, in _isvd
    new_data = new_data.resplit_(U_old.split) - U_old @ UtC
  File ".../heat/heat/core/linalg/basics.py", line 1291, in _matmul
    return matmul(self, other)
  File ".../heat/heat/core/linalg/basics.py", line 844, in matmul
    if b.lshape[-1] % nB != 0 or (kB == 1 and b.lshape[-1] != 1):
ZeroDivisionError: integer division or modulo by zero

In my implementation of the fit() method for IncrementalPCA() (see PR #2005) I treat similar cases with this check:

if shape[1] < ht.MPI_WORLD.size:
            raise ValueError(
                f"The number of columns ({shape[1]}) must be at least equal to the number of processes ({ht.MPI_WORLD.size})."
            )

Version

main (development branch)

Python version

None

PyTorch version

None

MPI version

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions