Skip to content

Adding more characters to the dictionary in the typo function #38

@abheesht17

Description

@abheesht17

In decepticonlp/transforms/perturb.py, in the function typo, we have defined the Python dictionary a certain way, with the keys as all the characters, and their corresponding values as the characters close to the respective key on the QWERTY keyboard. But we haven't taken digits (0-9) into account. Also, we might have missed out on a few alphabetic characters as well.
For example,
1.
Our implementation: "e": ["w", "s", "d", "r"]
Their implementation: "e": ["2","@","3","#","4","$","w","r","s","d","f"]
2.
Our implementation: "h": ["g", "y", "u", "j", "n", "b"]
Their implementation: "h":["t","y","u","g","j","b","n","m"]

For details, have a look at this (under the section QWERTY):
https://towardsdatascience.com/data-augmentation-library-for-text-9661736b13ff

They have used "One Keyword Distance Error" while deciding which characters are in proximity on the QWERTY Keyboard.

I am a bit doubtful about special characters though, since users tend to remove them during text pre-processing. So, I leave that to your discretion.

Even if we ignore the extra alphabetic characters, I think numeric characters must be added.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions