Provide special treatment for human-written text

In JSON Schema specification `maxLength` keyword specifies the amount of characters in field, not bytes. However, in Redshift (as in most databases, I believe) in `VARCHAR` we specify amount of bytes, which may introduce mismatch in text-fields, usually written by humans, not computers.

This is generally not a problem as analytical data typically contains ASCII-text (written by computers), where amount of bytes precisely match amount of characters.

But at the same time, I can imagine an issue:
1. User supposed to enter his/her city name in native language (likely with non-ASCII characters) 
2. Web-developer constrains input-field to 32 characters
3. Analysts makes a wrong assumption that `maxLength: 32` is correct constrain
4. Redshift truncates all non-ASCII city names to 16 characters

This could be done as part of https://github.com/snowplow/iglu/issues/170 (`format: "unicode"`, which specify that string has absolutely no structure) or similar custom JSON-schema extension.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide special treatment for human-written text #261

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide special treatment for human-written text #261

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions