remove_first_principal_component for Smooth Inverse Frequency  in Simple Sentence Similarity.ipynb

For Smooth Inverse Frequency in Simple Sentence Similarity.ipynb

In your code, merge sentences1 & sentences2 and remove_first_principal_component **together**.

```
        embeddings.append(embedding1)
        embeddings.append(embedding2)
embeddings = remove_first_principal_component(np.array(embeddings))
```

However, in original code of paper _"A Simple but Tough-to-Beat Baseline for Sentence Embeddings"_ ([https://github.com/PrincetonML/SIF/blob/master/src/sim_algo.py](url)), the author calculate embedding1 and embedding2 (including remove_first_principal_component part) **separately**.

```
emb1 = SIF_embedding.SIF_embedding(We, x1, w1, params)
emb2 = SIF_embedding.SIF_embedding(We, x2, w2, params)
```

I wander if this difference influence the result considerably.

I am doing query task, so there is only one sentence in sentences1. Should I (1) merge query & answers and remove_first_principal_component **together** or (2) calculate embedding1 for query and embedding2 for answers **separately** or (3) save the svd of answers (sentences2) and then remove first_principal_component of sentences2 from weights of query (sentences1)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove_first_principal_component for Smooth Inverse Frequency in Simple Sentence Similarity.ipynb #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

remove_first_principal_component for Smooth Inverse Frequency in Simple Sentence Similarity.ipynb #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions