Open
Description
Generating hashes for empty files will always return None
, which is not documented and different from the usual hashing algorithms as well as contradicting the SPDX standard.
Example:
from commoncode.hash import sha1
from hashlib import sha1 as sha1_hashlib
from tempfile import NamedTemporaryFile
with NamedTemporaryFile() as temporary_file:
temporary_file.write(b'')
temporary_file.seek(0)
print(sha1(location=temporary_file.name))
print(sha1_hashlib(string=temporary_file.read(), used_for_security=False).hexdigest())
The reason seems to be that https://github.com/aboutcode-org/commoncode/blob/878be6140deac30e2b95fb0fad9eb8feca015fc8/src/commoncode/hash.py#L38 does not use msg is not None
, but basically bool(msg)
, which is False
for empty inputs as well.
Replacing the line with
self.h = msg is not None and hmodule(msg).digest()[:self.digest_size] or None
(as well as replacing the same pattern in sha1_git_hasher
) seems to fix this issue.