Skip to content

Non latin-1 filenames are not supported #13

Open
@spock

Description

@spock

Thank you for a very thought-out tool! Currently evaluating it for keeping my 400+GB, 50k-file archive safe(r).

While doing so, came across this exception:

Traceback (most recent call last):
  File "/home/user/.local/bin/pff", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.10/site-packages/pyFileFixity/pff.py", line 108, in main
    return saecc_main(argv=subargs, command=fullcommand)
  File "/home/user/.local/lib/python3.10/site-packages/pyFileFixity/structural_adaptive_ecc.py", line 574, in main
    relfilepath_ecc = compute_ecc_hash_from_string(relfilepath, ecc_manager_intra, hasher_intra, max_block_size, resilience_rate_intra)
  File "/home/user/.local/lib/python3.10/site-packages/pyFileFixity/structural_adaptive_ecc.py", line 203, in compute_ecc_hash_from_string
    fpfile = BytesIO(b(string))
  File "/home/user/.local/lib/python3.10/site-packages/pyFileFixity/lib/_compat.py", line 36, in b
    return codecs.latin_1_encode(x)[0]
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 16-26: ordinal not in range(256)

Looking at the code, it seems that latin-1 is used as an internal encoding - which can indeed not handle some of the non-latin-1 characters:

if sys.version_info < (3,):
    def b(x):
        return x
else:
    import codecs
    def b(x):
        if isinstance(x, _str):
            return codecs.latin_1_encode(x)[0]  # <-- here
        else:
            return x

Problematic filename had Ukrainian/Cyrillic characters, which I think are not a part of latin-1 encoding.
Example string: зображення.

pyFileFixity version 3.1.4 installed with pip. I'm on Python 3.10.12.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions