Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 393 Bytes

README.md

File metadata and controls

19 lines (13 loc) · 393 Bytes

hash-dataset

Implementing a hashing technique to compare large scale, out of core machine learning datasets

Instructions

To compare two datasets:

python3 src/dataset-hash.py data/small-dataset-1 data/small-dataset-2

Available options:

python3 src/hash-dataset.py [OPTIONS] [FILES]

-m : Display samples that are matching
-n : Display samples that are not matching