Skip to content

do_bert.sh error in bert_embed.py #4

@shamgane

Description

@shamgane

I have been trying to run the mBERT extraction script for the dataset : ca/head_first with bert-base-multilingual-cased. I am faced with the following error trace :

Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/train.conllu
Saving to data/sent_graphs/ca/head_first/train_bert.hdf5
Embedding...
0%| | 0/1173 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/dev.conllu
Saving to data/sent_graphs/ca/head_first/dev_bert.hdf5
Embedding...
0%| | 0/168 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError
Using bert-base-multilingual-cased for data/sent_graphs/ca/head_first/test.conllu
Saving to data/sent_graphs/ca/head_first/test_bert.hdf5
Embedding...
0%| | 0/336 [00:00<?, ?it/s]
Traceback (most recent call last):
File "bert_embed.py", line 135, in
reps, sids = ee(model, args.indata)
File "bert_embed.py", line 108, in ee
assert len(sent.split()) == len(ave_reps)
AssertionError

Seems like the output from average_reps function in bert_embed.py is giving an empty output [] for the data : 'Bona ubicació .' When it reaches the assert statement, this output length is clearly not equal to the length of the number of tokens in the sentence. This was an example that I illustrated to explain the problem. Would really appreciate if you could guide me on how to fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions