For Smooth Inverse Frequency in Simple Sentence Similarity.ipynb
In your code, merge sentences1 & sentences2 and remove_first_principal_component together.
embeddings.append(embedding1)
embeddings.append(embedding2)
embeddings = remove_first_principal_component(np.array(embeddings))
However, in original code of paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings" (https://github.com/PrincetonML/SIF/blob/master/src/sim_algo.py), the author calculate embedding1 and embedding2 (including remove_first_principal_component part) separately.
emb1 = SIF_embedding.SIF_embedding(We, x1, w1, params)
emb2 = SIF_embedding.SIF_embedding(We, x2, w2, params)
I wander if this difference influence the result considerably.
I am doing query task, so there is only one sentence in sentences1. Should I (1) merge query & answers and remove_first_principal_component together or (2) calculate embedding1 for query and embedding2 for answers separately or (3) save the svd of answers (sentences2) and then remove first_principal_component of sentences2 from weights of query (sentences1)?
For Smooth Inverse Frequency in Simple Sentence Similarity.ipynb
In your code, merge sentences1 & sentences2 and remove_first_principal_component together.
However, in original code of paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings" (https://github.com/PrincetonML/SIF/blob/master/src/sim_algo.py), the author calculate embedding1 and embedding2 (including remove_first_principal_component part) separately.
I wander if this difference influence the result considerably.
I am doing query task, so there is only one sentence in sentences1. Should I (1) merge query & answers and remove_first_principal_component together or (2) calculate embedding1 for query and embedding2 for answers separately or (3) save the svd of answers (sentences2) and then remove first_principal_component of sentences2 from weights of query (sentences1)?