Skip to content

Question about PCA #1623

@SerbulovArtem

Description

@SerbulovArtem

Hi, developers! Thank you for working on this project. It has helped me tremendously with my work.

I have a quick question.

I used Word Embeddings visualization via PCA for a text classification model and found some outliers that were far from the other examples.

Then, I checked the PCA code at here and found the following lines of code:

self._mean = np.mean(x_train, 0)
x_train = x_train - self._mean

As far as I understood from PR #559, the code above is a reimplementation of Scikit-learn's PCA in NumPy.

But here's the question: Why do you only mean-centering?

Why not add standardization to make the code look like this?

self._mean = np.mean(x_train, 0)
self._std = np.std(x_train, 0)
x_train = (x_train - self._mean) / self._std

After I changed to that, my visualizations started to look more 'ordered'

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions