Skip to content

license file gathering for non-utf8 files #868

Open
@jkowalleck

Description

@jkowalleck

problem

gathering license files (PEP638) may detect files that are not UTF8.
python's default character encoding is UTF8. it causes an exception, when decoding fails.

we need resilience when collecting license texts.

REMARK FROM MAINTAINERS:

@jkowalleck: PEP638 clearly defines to expect UTF8 text only. This means, all text that is not proper UTF8 can be skipped, as it does not adhere to the spec.
The only exception are Windows systems, where the encoding might be off, since the default encoding of the OS/FS is non-UTF8 so that files are "migrated" when put to disc.

goal

don't crash when a license file is not UTF8.

expected outcome

  • encoding issues are not ignored (except UnicodeDecodeError) but dont happen in the first place
  • have tests in place. (set test files to binary via gitattributes-file might be required, to keep the intended non-UTF8 encoding)

solution

being discussed in the comments

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions