-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
File "run_contract_qa.py", line 365, in convert_examples_to_features
input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/home/py/Contract_Elements_Extraction/bert/tokenization.py", line 128, in convert_tokens_to_ids
return convert_by_vocab(self.vocab, tokens)
File "/home/py/Contract_Elements_Extraction/bert/tokenization.py", line 89, in convert_by_vocab
output.append(vocab[item])
KeyError: 'ど'
----------the japnese char 'ど' seems not within the vocab.txt file ,it is quite strange. I verified this problem by :
cd multilingual_L-12_H-768_A-12
grep ど vocab.txt
and got no result.
buddy,can you help me to run further?I am sorry the ど appears within the json files so frequently
Metadata
Metadata
Assignees
Labels
No labels