Open
Description
Thank you for this project! It is very helpful for me to understand how GPT2 synthesize text.
I also noticed that the GPT2/encoder.py
does not implement the capability of recognizing special tokens as the HuggingFace tokenzier could.
The part of source code in HuggingFace's repo is at https://github.com/huggingface/transformers/blob/c836f77266be9ace47bff472f63caf71c0d11333/src/transformers/tokenization_utils.py#L516-L520
I understand that it is not critical, because there is only one special token <|endoftext|>
in use wangkuiyi/huggingface-tokenizer-in-cxx#11
So, just saying.
Metadata
Metadata
Assignees
Labels
No labels