unicode normalization

I wonder if we should not normalize unicode as part of our Atlas data prep. I was looking on line about how to do it and found this code from some guy named Tauber ....
@jtauber @lcerrato @AlisonBabeu 

from unicodedata import normalize
curword = normalize("NFC",m[1])

My thinking: 

1. Anything in our repos should probably be normalized (e.g., the Greek from the Greco-Arabic corpus).
2. Anything we import into Atlas, we should normalize. That would imply some code in the Atlas data prep pipeline (I think)

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

unicode normalization #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

unicode normalization #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions