Skip to content

max_len check gives poor warning message #40

@prhbrt

Description

@prhbrt

Change this:

msg = f'Sentence #{item} length {len(tokens)} exceeds max_len {self.max_len} and has been truncated'

to

msg = f'Sentence #{item} length {len(tokens)} exceeds max_len {self.max_len} - 2 and has been truncated, note that two tokens are used to surround the sentence with the [CLS] and [SEP] token'

Since the warning Sentence 4 length 511 exceeds max_len 512 and has been truncated doesn't make sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions