Question about PCA

Hi, developers! Thank you for working on this project. It has helped me tremendously with my work.

I have a quick question.

I used Word Embeddings visualization via PCA for a text classification model and found some outliers that were far from the other examples.

Then, I checked the PCA code at [here](https://github.com/PAIR-code/lit/blob/main/lit_nlp/components/pca.py) and found the following lines of code:

```
self._mean = np.mean(x_train, 0)
x_train = x_train - self._mean
```

As far as I understood from PR #559, the code above is a reimplementation of Scikit-learn's PCA in NumPy.

But here's the question: Why do you only mean-centering?

Why not add standardization to make the code look like this?

```
self._mean = np.mean(x_train, 0)
self._std = np.std(x_train, 0)
x_train = (x_train - self._mean) / self._std
```

After I changed to that, my visualizations started to look more 'ordered'

Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about PCA #1623

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about PCA #1623

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions