BUG CEP-19: Different directory trees may end up with the same hash value

### Checklist

- [x] I added a descriptive title
- [x] I searched open reports and couldn't find a duplicate

### What happened?

I have set up two different directory trees:

```
|-- testdata1
|   `-- testFhello-world
`-- testdata2
    |-- test
    `-- world
```

Using the python script from CEP-19 to hash these two trees, they both have the same hash value:

```
CEP19 hash of testdata1: e91a9f9adcb3561a7a78a04d6f33b391beb92491f9ed99663b455867b031d30a
CEP19 hash of testdata2: e91a9f9adcb3561a7a78a04d6f33b391beb92491f9ed99663b455867b031d30a
```

The file name `testFhello-world` in `testdata1` is added to the hash stream and is indistinguishable from a file name `test` with contents `hello` followed by a file `world` with whatever contents `testFhello-world` has. This is not the only way to confuse the algorithm: You can also use the contents of files to "pretend" there are more files.

One way to stop these filesystem trees from having the same hash value is to add the length of the input. You can do that either by hashing the length as an integer of a defined bit length or "stringified" followed by a separator like `:`. The separator is needed after a stringified value as without it the user provided contents may change that length by starting with digits. That might allow for another way to confuse the algorithm.

[cep-19-fail.tar.gz](https://github.com/user-attachments/files/24966440/cep-19-fail.tar.gz) has the script from CEP19 and the two directories so you can try for yourself.

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG CEP-19: Different directory trees may end up with the same hash value #150

Checklist

What happened?

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG CEP-19: Different directory trees may end up with the same hash value #150

Description

Checklist

What happened?

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions