Open
Description
problem
gathering license files (PEP638) may detect files that are not UTF8.
python's default character encoding is UTF8. it causes an exception, when decoding fails.
we need resilience when collecting license texts.
REMARK FROM MAINTAINERS:
@jkowalleck: PEP638 clearly defines to expect UTF8 text only. This means, all text that is not proper UTF8 can be skipped, as it does not adhere to the spec.
The only exception are Windows systems, where the encoding might be off, since the default encoding of the OS/FS is non-UTF8 so that files are "migrated" when put to disc.
goal
don't crash when a license file is not UTF8.
expected outcome
- encoding issues are not ignored (
except UnicodeDecodeError
) but dont happen in the first place - have tests in place. (set test files to
binary
via gitattributes-file might be required, to keep the intended non-UTF8 encoding)
solution
being discussed in the comments