one problem with your train.json and dev.json

File "run_contract_qa.py", line 365, in convert_examples_to_features
    input_ids = tokenizer.convert_tokens_to_ids(tokens)
  File "/home/py/Contract_Elements_Extraction/bert/tokenization.py", line 128, in convert_tokens_to_ids
    return convert_by_vocab(self.vocab, tokens)
  File "/home/py/Contract_Elements_Extraction/bert/tokenization.py", line 89, in convert_by_vocab
    output.append(vocab[item])
KeyError: 'ど'

----------the japnese char 'ど' seems not within the vocab.txt file ，it is quite strange. I verified this problem by ：
cd multilingual_L-12_H-768_A-12
grep  ど vocab.txt
and got no result.
buddy,can you help me to run further?I am sorry the  ど appears within the json files so frequently



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

one problem with your train.json and dev.json #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

one problem with your train.json and dev.json #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions