Open
Description
AnchorText offers support for three masked language model: DistilbertBaseUncased
, BertBaseUncased
, RobertaBase
. All previously enumerate classes inherit the LanguageModel
class and overwrite two methods.
For example, DistilbertBaseUncased
:
class DistilbertBaseUncased(LanguageModel):
SUBWORD_PREFIX = '##'
def __init__(self, preloading: bool = True):
"""
Initialize DistilbertBaseUncased.
Parameters
----------
preloading
See `LanguageModel` constructor.
"""
super().__init__("distilbert-base-uncased", preloading)
@property
def mask(self) -> str:
return self.tokenizer.mask_token
def is_subword_prefix(self, token: str) -> bool:
return token.startswith(DistilbertBaseUncased.SUBWORD_PREFIX)
Other language models can be included in a similar fashion.
How should we manage this extension?
Should we write a tutorial for wrapping any transformer with LanguageModel
?
Or can we do something more out-of-the-box?