You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-15
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ This implementation is provided with [Google's pre-trained models](https://githu
14
14
|[Doc](#doc)| Detailed documentation |
15
15
|[Examples](#examples)| Detailed examples on how to fine-tune Bert |
16
16
|[Notebooks](#notebooks)| Introduction on the provided Jupyter Notebooks |
17
-
|[TPU](#tup)| Notes on TPU support and pretraining scripts |
17
+
|[TPU](#tpu)| Notes on TPU support and pretraining scripts |
18
18
|[Command-line interface](#Command-line-interface)| Convert a TensorFlow checkpoint in a PyTorch dump |
19
19
20
20
## Installation
@@ -46,13 +46,14 @@ python -m pytest -sv tests/
46
46
47
47
This package comprises the following classes that can be imported in Python and are detailed in the [Doc](#doc) section of this readme:
48
48
49
-
- Six PyTorch models (`torch.nn.Module`) for Bert with pre-trained weights (in the [`modeling.py`](./pytorch_pretrained_bert/modeling.py) file):
50
-
-[`BertModel`](./pytorch_pretrained_bert/modeling.py#L535) - raw BERT Transformer model (**fully pre-trained**),
51
-
-[`BertForMaskedLM`](./pytorch_pretrained_bert/modeling.py#L689) - BERT Transformer with the pre-trained masked language modeling head on top (**fully pre-trained**),
52
-
-[`BertForNextSentencePrediction`](./pytorch_pretrained_bert/modeling.py#L750) - BERT Transformer with the pre-trained next sentence prediction classifier on top (**fully pre-trained**),
53
-
-[`BertForPreTraining`](./pytorch_pretrained_bert/modeling.py#L618) - BERT Transformer with masked language modeling head and next sentence prediction classifier on top (**fully pre-trained**),
54
-
-[`BertForSequenceClassification`](./pytorch_pretrained_bert/modeling.py#L812) - BERT Transformer with a sequence classification head on top (BERT Transformer is **pre-trained**, the sequence classification head **is only initialized and has to be trained**),
55
-
-[`BertForQuestionAnswering`](./pytorch_pretrained_bert/modeling.py#L877) - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**, the token classification head **is only initialized and has to be trained**).
49
+
- Seven PyTorch models (`torch.nn.Module`) for Bert with pre-trained weights (in the [`modeling.py`](./pytorch_pretrained_bert/modeling.py) file):
50
+
-[`BertModel`](./pytorch_pretrained_bert/modeling.py#L537) - raw BERT Transformer model (**fully pre-trained**),
51
+
-[`BertForMaskedLM`](./pytorch_pretrained_bert/modeling.py#L691) - BERT Transformer with the pre-trained masked language modeling head on top (**fully pre-trained**),
52
+
-[`BertForNextSentencePrediction`](./pytorch_pretrained_bert/modeling.py#L752) - BERT Transformer with the pre-trained next sentence prediction classifier on top (**fully pre-trained**),
53
+
-[`BertForPreTraining`](./pytorch_pretrained_bert/modeling.py#L620) - BERT Transformer with masked language modeling head and next sentence prediction classifier on top (**fully pre-trained**),
54
+
-[`BertForSequenceClassification`](./pytorch_pretrained_bert/modeling.py#L814) - BERT Transformer with a sequence classification head on top (BERT Transformer is **pre-trained**, the sequence classification head **is only initialized and has to be trained**),
55
+
-[`BertForTokenClassification`](./pytorch_pretrained_bert/modeling.py#L880) - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**, the token classification head **is only initialized and has to be trained**),
56
+
-[`BertForQuestionAnswering`](./pytorch_pretrained_bert/modeling.py#L946) - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**, the token classification head **is only initialized and has to be trained**).
56
57
57
58
- Three tokenizers (in the [`tokenization.py`](./pytorch_pretrained_bert/tokenization.py) file):
@@ -153,7 +154,7 @@ Here is a detailed documentation of the classes in the package and how to use th
153
154
| Sub-section | Description |
154
155
|-|-|
155
156
|[Loading Google AI's pre-trained weigths](#Loading-Google-AIs-pre-trained-weigths-and-PyTorch-dump)| How to load Google AI's pre-trained weight or a PyTorch saved instance |
156
-
|[PyTorch models](#PyTorch-models)| API of the six PyTorch model classes: `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification` or `BertForQuestionAnswering`|
157
+
|[PyTorch models](#PyTorch-models)| API of the seven PyTorch model classes: `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification` or `BertForQuestionAnswering`|
157
158
|[Tokenizer: `BertTokenizer`](#Tokenizer-BertTokenizer)| API of the `BertTokenizer` class|
158
159
|[Optimizer: `BertAdam`](#Optimizer-BertAdam)| API of the `BertAdam` class |
159
160
@@ -167,25 +168,31 @@ model = BERT_CLASS.from_pretrain(PRE_TRAINED_MODEL_NAME_OR_PATH, cache_dir=None)
167
168
168
169
where
169
170
170
-
-`BERT_CLASS` is either the `BertTokenizer` class (to load the vocabulary) or one of the six PyTorch model classes (to load the pre-trained weights): `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification` or `BertForQuestionAnswering`, and
171
+
-`BERT_CLASS` is either the `BertTokenizer` class (to load the vocabulary) or one of the seven PyTorch model classes (to load the pre-trained weights): `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification`, `BertForTokenClassification` or `BertForQuestionAnswering`, and
171
172
-`PRE_TRAINED_MODEL_NAME_OR_PATH` is either:
172
173
173
174
- the shortcut name of a Google AI's pre-trained model selected in the list:
-`bert-base-chinese`: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
180
183
181
184
- a path or url to a pretrained model archive containing:
182
-
183
-
-`bert_config.json` a configuration file for the model, and
184
-
-`pytorch_model.bin` a PyTorch dump of a pre-trained instance `BertForPreTraining` (saved with the usual `torch.save()`)
185
+
186
+
-`bert_config.json` a configuration file for the model, and
187
+
-`pytorch_model.bin` a PyTorch dump of a pre-trained instance `BertForPreTraining` (saved with the usual `torch.save()`)
185
188
186
189
If `PRE_TRAINED_MODEL_NAME_OR_PATH` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links [here](pytorch_pretrained_bert/modeling.py)) and stored in a cache folder to avoid future download (the cache folder can be found at `~/.pytorch_pretrained_bert/`).
187
190
-`cache_dir` can be an optional path to a specific directory to download and cache the pre-trained model weights. This option is useful in particular when you are using distributed training: to avoid concurrent access to the same weights you can set for example `cache_dir='./pretrained_model_{}'.format(args.local_rank)` (see the section on distributed training for more information)
188
191
192
+
`Uncased` means that the text has been lowercased before WordPiece tokenization, e.g., `John Smith` becomes `john smith`. The Uncased model also strips out any accent markers. `Cased` means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part-of-Speech tagging). For information about the Multilingual and Chinese model, see the [Multilingual README](https://github.com/google-research/bert/blob/master/multilingual.md) or the original TensorFlow repository.
193
+
194
+
**When using an `uncased model`, make sure to pass `--do_lower_case` to the training scripts. (Or pass `do_lower_case=True` directly to FullTokenizer if you're using your own script.)**
195
+
189
196
Example:
190
197
```python
191
198
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
@@ -271,7 +278,13 @@ The sequence-level classifier is a linear layer that takes as input the last hid
271
278
272
279
An example on how to use this class is given in the `run_classifier.py` script which can be used to fine-tune a single sequence (or pair of sequence) classifier using BERT, for example for the MRPC task.
273
280
274
-
#### 6. `BertForQuestionAnswering`
281
+
#### 6. `BertForTokenClassification`
282
+
283
+
`BertForTokenClassification` is a fine-tuning model that includes `BertModel` and a token-level classifier on top of the `BertModel`.
284
+
285
+
The token-level classifier is a linear layer that takes as input the last hidden state of the sequence.
286
+
287
+
#### 7. `BertForQuestionAnswering`
275
288
276
289
`BertForQuestionAnswering` is a fine-tuning model that includes `BertModel` with a token-level classifiers on top of the full sequence of last hidden states.
0 commit comments