Skip to content

Returning an empty array from get_embedding #1

@tobrun

Description

@tobrun

In case you come across a word that is not part of the original GloVe dataset.
We currently return an empty array.

def get_embedding( word ):
    try:
        return vectors[ indexes[ word ] ]
    except KeyError:
        return []

This however should be an array of zeros with a size matching the dataset sizes:

def get_embedding(word):
    try:
        return vectors[indexes[word]]
    except KeyError:
        return np.zeros(50) 

In this case I'm using numpy because I'm running this in a real python env for preprocessing but could be an floatarray size 50 with all zeros instead.

This solves issues like:

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 2 has size 50

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions