Skip to content

MinHash: benchmark memory, speed and accuracy with varying r and b #23

@vienneraphael

Description

@vienneraphael

This issue concerns fuzzy deduplication of text pairs.

Find what's the tradeoff between memory, speed, accuracy when varying r and b.
We need to find a way to use way less than 9k because it requires too much CPU resources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions