Cannot recognize <|endoftext|>

Thank you for this project! It is very helpful for me to understand how GPT2 synthesize text.

I also noticed that the [`GPT2/encoder.py`](https://github.com/graykode/gpt-2-Pytorch/blob/master/GPT2/encoder.py) does not implement the capability of recognizing special tokens as the HuggingFace tokenzier could.

The part of source code in HuggingFace's repo is at https://github.com/huggingface/transformers/blob/c836f77266be9ace47bff472f63caf71c0d11333/src/transformers/tokenization_utils.py#L516-L520

I understand that it is not critical, because there is only one special token `<|endoftext|>` in use https://github.com/wangkuiyi/huggingface-tokenizer-in-cxx/issues/11

So, just saying.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot recognize <|endoftext|> #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cannot recognize <|endoftext|> #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions