Skip to content

Hash is unexpectedly None for empty files #4428

Open
@stefan6419846

Description

@stefan6419846

Generating hashes for empty files will always return None, which is not documented and different from the usual hashing algorithms as well as contradicting the SPDX standard.

Example:

from commoncode.hash import sha1
from hashlib import sha1 as sha1_hashlib
from tempfile import NamedTemporaryFile

with NamedTemporaryFile() as temporary_file:
    temporary_file.write(b'')
    temporary_file.seek(0)
    print(sha1(location=temporary_file.name))
    print(sha1_hashlib(string=temporary_file.read(), used_for_security=False).hexdigest())

The reason seems to be that https://github.com/aboutcode-org/commoncode/blob/878be6140deac30e2b95fb0fad9eb8feca015fc8/src/commoncode/hash.py#L38 does not use msg is not None, but basically bool(msg), which is False for empty inputs as well.

Replacing the line with

            self.h = msg is not None and hmodule(msg).digest()[:self.digest_size] or None

(as well as replacing the same pattern in sha1_git_hasher) seems to fix this issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions