Fine tuned bert LM

Hi, 
I use `pytorch_pretrained_BERT/examples/python run_lm_finetuning.py` to fit the model with monolingual set of sentences. I use bert multilingual cased model.

Once the model is fine-tuned, I get the loss for given sentences with the following code:
```
def get_score(sentence, model):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    model.eval()
    predictions=model(tensor_input)
    loss_fct = torch.nn.CrossEntropyLoss()
    loss = loss_fct(predictions.squeeze(),tensor_input.squeeze()).data 
    return math.exp(loss)
```

```
sentence = "ﺶﻋﺮﺴﺗﺎﻧ؛ ﺩ پښﺕﻭ ﺶﻋﺭپﻮﻬﻧې ﻥﻭی پړﺍﻭ - ﺕﺎﻧﺩ"
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
stats=torch.load('pytorch_model.bin')
bertMaskedLM = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased', state_dict=stats)

print(get_score(sentence, bertMaskedLM))
```
 78637.05198167797
```
bertMaskedLM_orig = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
print(get_score(sentence, bertMaskedLM_orig))
```
7.919475431571431

The strange thing is that the fine-tuned model returns much higher loss scores, even if the evaluated sentence appeared in monolingual training data. 

Is something I am doing wrong? I just want to check how well the given sentence fits into LM.

Regards and thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuned bert LM #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fine tuned bert LM #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions