-
Notifications
You must be signed in to change notification settings - Fork 319
[ghidra2cpg] Add support for loading Ghidra projects #5465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
// In the current implementation we use the first domain file | ||
// It is the user's responsibility to provide a Ghidra project with one domain file | ||
val domainFile = project.getRootFolder().getFiles().head | ||
program = project.openProgram("/", domainFile.getName(), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this going to work on windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is indeed suspicious. It is not a problem though because if you open Ghidra on Windows you will see that the path uses / as separator there as well (see picture below). / here means the project root (not a file system path). If you move the domain file under the folder test
for example then you need to use /test/<domain file>
. If I try to use "\" even on Windows, the following error is raised:
java.lang.IllegalArgumentException: Absolute path must begin with '/'
at ghidra.framework.data.DefaultProjectData.getFile(DefaultProjectData.java:663)
at ghidra.base.project.GhidraProject.openProgram(GhidraProject.java:303)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so it works for me under Windows as well. I admit though that I only tested it under Linux first. Turns out you can not share a Ghidra project by copying the .gpr + .rep folders, I updated the related PR description above. Please refer to that so you can test it yourself as well.
PS C:\Users\agemes\git-repos\joern> joern-parse .\hello-arm64.gpr --language GHIDRA
Parsing code at: .\hello-arm64.gpr - language: `GHIDRA`
[+] Running language frontend
=======================================================================================================
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: C:\Users\agemes\git-repos\joern\joern-cli\target\universal\stage\ghidra2cpg.bat -J-Xmx8068m .\hello-arm64.gpr --output cpg.bin
3) start joern, import the cpg: `importCpg("path/to/cpg")`
=======================================================================================================
[+] Applying default overlays
[INFO ] initialising from existing storage (C:\Users\agemes\git-repos\joern\cpg.bin)
[INFO ] Start of pass: io.joern.x2cpg.passes.base.FileCreationPass
[INFO ] Pass io.joern.x2cpg.passes.base.FileCreationPass completed in 104 ms (65% on mutations). 26 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.NamespaceCreator
[INFO ] Pass io.joern.x2cpg.passes.base.NamespaceCreator completed in 16 ms (27% on mutations). 3 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.TypeDeclStubCreator
[INFO ] Pass io.joern.x2cpg.passes.base.TypeDeclStubCreator completed in 19 ms (29% on mutations). 31 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.MethodStubCreator
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname UNKNOWN (we have 2 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.assignment (we have 3 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.addressOf (we have 1 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.incBy (we have 2 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.goto (we have 4 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.logicalShiftRight (we have 1 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.arithmeticShiftRight (we have 1 many variants)
[INFO ] Pass io.joern.x2cpg.passes.base.MethodStubCreator completed in 55 ms (6% on mutations). 95 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.ParameterIndexCompatPass
[INFO ] Pass io.joern.x2cpg.passes.base.ParameterIndexCompatPass completed in 7 ms (54% on mutations). 4 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.MethodDecoratorPass
[INFO ] Pass io.joern.x2cpg.passes.base.MethodDecoratorPass completed in 9 ms (35% on mutations). 87 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.AstLinkerPass
...
[INFO ] Number of definitions for __libc_start_main: 0
[INFO ] Number of definitions for <operator>.assignment: 4
[INFO ] Calculating reaching definitions for: __gmon_start__ in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for UNKNOWN: 2
[INFO ] Calculating reaching definitions for: _ITM_registerTMCloneTable in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for __gmon_start__: 0
[INFO ] Calculating reaching definitions for: _start in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for _ITM_registerTMCloneTable: 0
[INFO ] Calculating reaching definitions for: register_tm_clones in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Calculating reaching definitions for: __gmon_start__ in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Calculating reaching definitions for: _dl_relocate_static_pie in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for call_weak_fn: 7
[INFO ] Number of definitions for _init: 12
[INFO ] Number of definitions for __libc_start_main: 10
[INFO ] Number of definitions for FUN_004004f0: 12
[INFO ] Number of definitions for _fini: 10
[INFO ] Number of definitions for _start: 21
[INFO ] Number of definitions for __do_global_dtors_aux: 27
[INFO ] Number of definitions for _dl_relocate_static_pie: 10
[INFO ] Number of definitions for main: 16
[INFO ] Number of definitions for __gmon_start__: 9
[INFO ] Number of definitions for abort: 9
[INFO ] Number of definitions for register_tm_clones: 36
[INFO ] Number of definitions for deregister_tm_clones: 25
[INFO ] Calculating reaching definitions for: puts in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for puts: 10
[INFO ] Pass io.joern.dataflowengineoss.passes.reachingdef.ReachingDefPass completed in 391 ms (1% on mutations). 452 + 0 changes committed from 30 parts.
[INFO ] Start of pass: io.shiftleft.semanticcpg.Overlays$$anon$1
[INFO ] Pass io.shiftleft.semanticcpg.Overlays$$anon$1 completed in 1 ms (70% on mutations). 1 + 0 changes committed from 1 parts.
[INFO ] closing graph: writing to storage at `C:\Users\agemes\git-repos\joern\cpg.bin`
Successfully wrote graph to: C:\Users\agemes\git-repos\joern\cpg.bin
To load the graph, type `joern C:\Users\agemes\git-repos\joern\cpg.bin`
PS C:\Users\agemes\git-repos\joern> joern cpg.bin
Creating project `cpg.bin3` for CPG at `cpg.bin`
Creating working copy of CPG to be safe
Loading base CPG from: C:\Users\agemes\git-repos\joern\workspace\cpg.bin3\cpg.bin.tmp
[INFO ] initialising from existing storage (C:\Users\agemes\git-repos\joern\workspace\cpg.bin3\cpg.bin.tmp)
Overlay dataflowOss already exists - skipping
The graph has been modified. You may want to use the `save` command to persist changes to disk. All changes will also be saved collectively on exit
██╗ ██████╗ ███████╗██████╗ ███╗ ██╗
██║██╔═══██╗██╔════╝██╔══██╗████╗ ██║
██║██║ ██║█████╗ ██████╔╝██╔██╗ ██║
██ ██║██║ ██║██╔══╝ ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║ ██║██║ ╚████║
╚════╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═══╝
Version: 0.0.0+3813-9478888c
Type `help` to begin
joern> cpg.method.name.l
val res1: List[String] = List(
"_init",
"FUN_004004f0",
"__libc_start_main",
"__gmon_start__",
"abort",
"puts",
"_start",
"_dl_relocate_static_pie",
"call_weak_fn",
"deregister_tm_clones",
"register_tm_clones",
"__do_global_dtors_aux",
"frame_dummy",
"main",
"_fini",
"__libc_start_main",
"_ITM_deregisterTMCloneTable",
"__gmon_start__",
"abort",
"puts",
"_ITM_registerTMCloneTable",
"UNKNOWN",
"<operator>.assignment",
"<operator>.addressOf",
"<operator>.incBy",
"<operator>.goto",
"<operator>.compare",
"<operator>.subtraction",
"<operator>.logicalShiftRight",
"<operator>.arithmeticShiftRight"
)
joern>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Ghidra projects can be shared in the .gzf format. That might be a future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also double-checked this with the Ghidra devs and they confirmed it:
That is a project path, which is OS-indepenent and always uses the forward-slash notation.
Still, thanks you for the PR! |
9478888
to
09b57e1
Compare
09b57e1
to
5c41a91
Compare
Fixes #2534
Added support for loading existing Ghidra projects. The linked issue describes why this is a useful feature. If the input file is a non-empty Ghidra project (.gpr), we load the first program (domain file) from it and create the CPG.
Test binary:
hello-arm64.zip
Edit:
Note: First I tried to upload the .gpr file but turns out Ghidra projects can not be easily shared in the .gpr + .rep format because such a project is locked to a username (to the user who created it). For testing, a project has to be manually created in Ghidra and the test binary (after unzip) has to be manually imported and auto-analyzed. Then you need to save the project and Ghidra needs to be closed completely. Otherwise it will lock the project via a lockfile and
joern-parse
will fail.Test run: