Skip to content

Commit 4df6d1b

Browse files
authored
readme, closes #76 (#238)
1 parent 2741c7c commit 4df6d1b

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
* [Benchmark](#benchmark)
1616
* [Main Requirements](#main-requirements)
1717
* [Installation](#installation)
18+
* [How to extend the dataset](#how-to-extend-the-dataset)
1819
* [How to run](#how-to-run)
1920
* [Benchmark Result](#benchmark-result)
2021
* [Used Tools for Benchmarking](#used-tools-for-benchmarking)
@@ -265,6 +266,19 @@ $ source venv/bin/activate
265266
$ pip install -qr requirements.txt
266267
```
267268

269+
### How to extend the dataset
270+
271+
1. Find an interesting repo and commit
272+
2. add to snapshot.json the data:
273+
``` json
274+
"{commit_hash}{any_padding_hex_symbols_to_64}": "https://github.com/org/repo",
275+
```
276+
3. run download_data.py twice (first - a meta file will be created, second - all files from the commit will be downloaded)
277+
4. run CredSweeper for the downloaded data to obtain a report (preferred with ``--ml_threshold 0`` argument)
278+
5. run benchmark for the report with ``--fix`` option - all found values will be inserted into meta
279+
6. review, correct markup if necessary, produce empty benchmark report for CI, commit the changes
280+
281+
268282
### How to run
269283
``` bash
270284
usage: python -m benchmark [-h] --scanner [SCANNER]

0 commit comments

Comments
 (0)