Skip to content

Subword tokenization #162

@bonham79

Description

@bonham79

What are people's thoughts on adding preprocessing scripts to allow BPE-like tokenization of characters? Technically we already support this (just tokenize your input and use delineation function). But wonder if we see it as worthwhile as also writing up the scripting so it can be managed by the repo as well?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions