- Corpus Database Construction
- Corpus Ground Truth Construction
- Libary Score Computation
Description: Create a Ghidra project to import and analyze binaries used for the corpus, and create a BSim database to contain them. This database is used by BSim to match functions in the corpus against unknown functions in the unknown binaries.
Usage:
mkdir -p /worksapce/corpus/ghidra-project \
/workspace/corpus/bsim-database \
/workspace/corpus/signatures &&
analyzeHeadless /workspace/corpus/ghidra-project \
postgres_object_files \
-import /workspace/binaries/corpus/* &&
bsim createdatabase file:/workspace/corpus/bsim-database/corpus medium_nosize &&
bsim generatesigs \
ghidra:/workspace/corpus/ghidra-project/postgres_object_files \
bsim="file:/workspace/corpus/bsim-database/corpus" \
/workspace/corpus/signatures &&
bsim commitsigs \
file:/workspace/corpus/bsim-database/corpus \
/workspace/corpus/signaturesOutput:
- A Ghidra project under
/workspace/corpus/ghidra-projectcontaining the imported corpus binaries. - A list of
sig_files under/workspace/corpus/signaturesdirectory, each containing XML contents including binary inforation and hashed features of the functions in the corpus. - A BSim database under
/workspace/corpus/bsim-database/corpuscontaining the signatures of the corpus binaries.
Description: Generate ground truth for each candidate function in the corpus by processing the signatures created in the Corpus Database Construction step. The ground truth is used to calculate the adjancy scores between candidate functions during the graph construction in RevDecode.
Usage:
python3 get_ground_truth_for_corpus.py \
--signature_dir /workspace/corpus/signatures \
--output_file /workspace/corpus/corpus_ground_truth.jsonOutput: Ground truth for the corpus will be stored in:
/workspace/corpus/corpus_ground_truth.json
Description: Generate library scores for binaries in the corpus based on the signatures created in the Corpus Database Construction step. The library scores are used as one of the weight factors in the RevDecode graph construction.
Usage:
python3 generate_library_scores.py \
--signature_dir /workspace/corpus/signatures \
--output_file /workspace/corpus/library_scores.jsonOutput: Library scores will be stored in:
/workspace/Corpus/library_scores.json