Skip to content

feat: collision detection - store multiple preimages per hash #11

@oritwoen

Description

@oritwoen

Context

Currently shaha deduplicates during build - if two different preimages produce the same hash, only one is stored (with merged sources). This prevents collision detection.

From discussion about Bitcoin security research portfolio - shaha could detect hash collisions if it stored all preimages, enabling queries like:

SELECT hash, COUNT(*), ARRAY_AGG(preimage)
FROM hashes
GROUP BY hash
HAVING COUNT(*) > 1

Problem

// Current behavior (build.rs deduplication)
hash("hello") = abc123 → save
hash("world") = abc123SKIP (duplicate), only merge sources

Real collisions are lost during build.

Proposed Solution

Schema Change (Option A - recommended)

Change from single preimage to list:

Current:
| hash   | preimage | algorithm | sources      |
| Binary | Utf8     | Utf8      | List<Utf8>   |

Proposed:
| hash   | preimages   | algorithm | sources      |
| Binary | List<Utf8>  | Utf8      | List<Utf8>   |

Alternative (Option B - simpler)

Remove deduplication, allow multiple rows per hash:

  • Simpler implementation
  • Larger storage footprint
  • Query with GROUP BY hash HAVING COUNT(*) > 1

New CLI Command

shaha collisions [--algorithm sha256] [--min-count 2] [--limit 100]

# Output
Found 3 collisions:

HASH: a1b2c3d4... (sha256)
  - "hello"    (rockyou.txt)
  - "world123" (custom.txt)

Use Case

Peter Todd's hash collision bounties (SHA256, RIPEMD160, HASH160, HASH256) - ~0.59 BTC unclaimed since 2013. With collision detection, shaha becomes a tool for:

  1. Building large preimage databases
  2. Detecting accidental collisions during build
  3. Targeted collision search campaigns

Breaking Change

This requires a schema migration. Existing databases would need rebuild.

Related

  • boha hash_collision collection tracks Peter Todd bounties
  • Birthday attack: 2^80 for RIPEMD160, 2^128 for SHA256

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions