Skip to content

Don't escape unicode fields in YAML Documents #2420

Closed
@fearful-symmetry

Description

@fearful-symmetry

This is a continuation of an issue in endpoint; right now the default behavior of yaml.dump() in scripts/generators/ecs_helpers.py is to escape unicode characters, but this behavior appears to be platform dependent, meaning that the copyright character will get escaped one of two ways:

example: "Microsoft\xAE Windows\xAE Operating System" # this is the unicode codepoint

or:

 example: "Microsoft\xC2\xAE Windows\xC2\xAE Operating System" # this is the UTF-8 value

This results in massive, needless diffs after running the update scripts, depending on who is running them at any given time.
I'm not sure if there's a reason why we're escaping unicode here, I assume it's just because it's the default behavior.

If we change it to this:

        yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)

We'll get this:

      example: Microsoft® Windows® Operating System

Which solves the problem.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions