Skip to content

AnchorText - extension for other language models. #438

Open
@RobertSamoilescu

Description

@RobertSamoilescu

AnchorText offers support for three masked language model: DistilbertBaseUncased, BertBaseUncased, RobertaBase. All previously enumerate classes inherit the LanguageModel class and overwrite two methods.
For example, DistilbertBaseUncased:

class DistilbertBaseUncased(LanguageModel):
    SUBWORD_PREFIX = '##'

    def __init__(self, preloading: bool = True):
        """
        Initialize DistilbertBaseUncased.

        Parameters
        ----------
        preloading
            See `LanguageModel` constructor.
        """
        super().__init__("distilbert-base-uncased", preloading)

    @property
    def mask(self) -> str:
        return self.tokenizer.mask_token

    def is_subword_prefix(self, token: str) -> bool:
        return token.startswith(DistilbertBaseUncased.SUBWORD_PREFIX)

Other language models can be included in a similar fashion.
How should we manage this extension?
Should we write a tutorial for wrapping any transformer with LanguageModel?
Or can we do something more out-of-the-box?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions