-
Notifications
You must be signed in to change notification settings - Fork 369
Open
Description
Hi, developers! Thank you for working on this project. It has helped me tremendously with my work.
I have a quick question.
I used Word Embeddings visualization via PCA for a text classification model and found some outliers that were far from the other examples.
Then, I checked the PCA code at here and found the following lines of code:
self._mean = np.mean(x_train, 0)
x_train = x_train - self._mean
As far as I understood from PR #559, the code above is a reimplementation of Scikit-learn's PCA in NumPy.
But here's the question: Why do you only mean-centering?
Why not add standardization to make the code look like this?
self._mean = np.mean(x_train, 0)
self._std = np.std(x_train, 0)
x_train = (x_train - self._mean) / self._std
After I changed to that, my visualizations started to look more 'ordered'
Thank you in advance!
Metadata
Metadata
Assignees
Labels
No labels