Skip to content

remove_first_principal_component for Smooth Inverse Frequency in Simple Sentence Similarity.ipynb #10

@ziweiji

Description

@ziweiji

For Smooth Inverse Frequency in Simple Sentence Similarity.ipynb

In your code, merge sentences1 & sentences2 and remove_first_principal_component together.

        embeddings.append(embedding1)
        embeddings.append(embedding2)
embeddings = remove_first_principal_component(np.array(embeddings))

However, in original code of paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings" (https://github.com/PrincetonML/SIF/blob/master/src/sim_algo.py), the author calculate embedding1 and embedding2 (including remove_first_principal_component part) separately.

emb1 = SIF_embedding.SIF_embedding(We, x1, w1, params)
emb2 = SIF_embedding.SIF_embedding(We, x2, w2, params)

I wander if this difference influence the result considerably.

I am doing query task, so there is only one sentence in sentences1. Should I (1) merge query & answers and remove_first_principal_component together or (2) calculate embedding1 for query and embedding2 for answers separately or (3) save the svd of answers (sentences2) and then remove first_principal_component of sentences2 from weights of query (sentences1)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions