Skip to content

Provide special treatment for human-written text #261

Open
@chuwy

Description

In JSON Schema specification maxLength keyword specifies the amount of characters in field, not bytes. However, in Redshift (as in most databases, I believe) in VARCHAR we specify amount of bytes, which may introduce mismatch in text-fields, usually written by humans, not computers.

This is generally not a problem as analytical data typically contains ASCII-text (written by computers), where amount of bytes precisely match amount of characters.

But at the same time, I can imagine an issue:

  1. User supposed to enter his/her city name in native language (likely with non-ASCII characters)
  2. Web-developer constrains input-field to 32 characters
  3. Analysts makes a wrong assumption that maxLength: 32 is correct constrain
  4. Redshift truncates all non-ASCII city names to 16 characters

This could be done as part of #170 (format: "unicode", which specify that string has absolutely no structure) or similar custom JSON-schema extension.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions