Skip to content

Don't escape unicode fields in YAML Documents #2420

Open
@fearful-symmetry

Description

@fearful-symmetry

This is a continuation of an issue in endpoint; right now the default behavior of yaml.dump() in scripts/generators/ecs_helpers.py is to escape unicode characters, but this behavior appears to be platform dependent, meaning that the copyright character will get escaped one of two ways:

example: "Microsoft\xAE Windows\xAE Operating System" # this is the unicode codepoint

or:

 example: "Microsoft\xC2\xAE Windows\xC2\xAE Operating System" # this is the UTF-8 value

This results in massive, needless diffs after running the update scripts, depending on who is running them at any given time.
I'm not sure if there's a reason why we're escaping unicode here, I assume it's just because it's the default behavior.

If we change it to this:

        yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)

We'll get this:

      example: Microsoft® Windows® Operating System

Which solves the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions