Skip to content

[ghidra2cpg] Add support for loading Ghidra projects #5465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gemesa
Copy link
Contributor

@gemesa gemesa commented May 2, 2025

Fixes #2534

Added support for loading existing Ghidra projects. The linked issue describes why this is a useful feature. If the input file is a non-empty Ghidra project (.gpr), we load the first program (domain file) from it and create the CPG.

Test binary:

hello-arm64.zip

Edit:
Note: First I tried to upload the .gpr file but turns out Ghidra projects can not be easily shared in the .gpr + .rep format because such a project is locked to a username (to the user who created it). For testing, a project has to be manually created in Ghidra and the test binary (after unzip) has to be manually imported and auto-analyzed. Then you need to save the project and Ghidra needs to be closed completely. Otherwise it will lock the project via a lockfile and joern-parse will fail.

Test run:

$ joern-parse ~/hello-arm64.gpr --language GHIDRA
Parsing code at: /home/gemesa/hello-arm64.gpr - language: `GHIDRA`
[+] Running language frontend
=======================================================================================================
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /home/gemesa/git-repos/joern/joern-cli/target/universal/stage/ghidra2cpg -J-Xmx3472m /home/gemesa/hello-arm64.gpr --output cpg.bin
3) start joern, import the cpg: `importCpg("path/to/cpg")`
=======================================================================================================

[+] Applying default overlays
Successfully wrote graph to: /home/gemesa/git-repos/tmp/cpg.bin
To load the graph, type `joern /home/gemesa/git-repos/tmp/cpg.bin`
                                                                                                                      
$ joern /home/gemesa/git-repos/tmp/cpg.bin
Creating project `cpg.bin` for CPG at `/home/gemesa/git-repos/tmp/cpg.bin`
Project with name cpg.bin already exists - overwriting
Creating working copy of CPG to be safe
Loading base CPG from: /home/gemesa/git-repos/tmp/workspace/cpg.bin/cpg.bin.tmp
Overlay dataflowOss already exists - skipping
The graph has been modified. You may want to use the `save` command to persist changes to disk.  All changes will also be saved collectively on exit

     ██╗ ██████╗ ███████╗██████╗ ███╗   ██╗
     ██║██╔═══██╗██╔════╝██╔══██╗████╗  ██║
     ██║██║   ██║█████╗  ██████╔╝██╔██╗ ██║
██   ██║██║   ██║██╔══╝  ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║  ██║██║ ╚████║
 ╚════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝
Version: 0.0.0+3813-9478888c
Type `help` to begin
      
                                                                                                                      
joern> cpg.method("main").code.l
val res1: List[String] = List(
  """
/* WARNING: Unknown calling convention -- yet parameter storage is locked */

int main(void)

{
  puts("hello world");
  return 0;
}

"""
)
                                                                                                                      
joern>

// In the current implementation we use the first domain file
// It is the user's responsibility to provide a Ghidra project with one domain file
val domainFile = project.getRootFolder().getFiles().head
program = project.openProgram("/", domainFile.getName(), true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this going to work on windows.

Copy link
Contributor Author

@gemesa gemesa May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed suspicious. It is not a problem though because if you open Ghidra on Windows you will see that the path uses / as separator there as well (see picture below). / here means the project root (not a file system path). If you move the domain file under the folder test for example then you need to use /test/<domain file>. If I try to use "\" even on Windows, the following error is raised:

java.lang.IllegalArgumentException: Absolute path must begin with '/'
        at ghidra.framework.data.DefaultProjectData.getFile(DefaultProjectData.java:663)
        at ghidra.base.project.GhidraProject.openProgram(GhidraProject.java:303)

image

Copy link
Contributor Author

@gemesa gemesa May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so it works for me under Windows as well. I admit though that I only tested it under Linux first. Turns out you can not share a Ghidra project by copying the .gpr + .rep folders, I updated the related PR description above. Please refer to that so you can test it yourself as well.

PS C:\Users\agemes\git-repos\joern> joern-parse .\hello-arm64.gpr --language GHIDRA
Parsing code at: .\hello-arm64.gpr - language: `GHIDRA`
[+] Running language frontend
=======================================================================================================
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: C:\Users\agemes\git-repos\joern\joern-cli\target\universal\stage\ghidra2cpg.bat -J-Xmx8068m .\hello-arm64.gpr --output cpg.bin
3) start joern, import the cpg: `importCpg("path/to/cpg")`
=======================================================================================================

[+] Applying default overlays
[INFO ] initialising from existing storage (C:\Users\agemes\git-repos\joern\cpg.bin)
[INFO ] Start of pass: io.joern.x2cpg.passes.base.FileCreationPass
[INFO ] Pass io.joern.x2cpg.passes.base.FileCreationPass completed in 104 ms (65% on mutations). 26 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.NamespaceCreator
[INFO ] Pass io.joern.x2cpg.passes.base.NamespaceCreator completed in 16 ms (27% on mutations). 3 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.TypeDeclStubCreator
[INFO ] Pass io.joern.x2cpg.passes.base.TypeDeclStubCreator completed in 19 ms (29% on mutations). 31 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.MethodStubCreator
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname UNKNOWN (we have 2 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.assignment (we have 3 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.addressOf (we have 1 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.incBy (we have 2 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.goto (we have 4 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.logicalShiftRight (we have 1 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname <operator>.arithmeticShiftRight (we have 1 many variants)
[INFO ] Pass io.joern.x2cpg.passes.base.MethodStubCreator completed in 55 ms (6% on mutations). 95 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.ParameterIndexCompatPass
[INFO ] Pass io.joern.x2cpg.passes.base.ParameterIndexCompatPass completed in 7 ms (54% on mutations). 4 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.MethodDecoratorPass
[INFO ] Pass io.joern.x2cpg.passes.base.MethodDecoratorPass completed in 9 ms (35% on mutations). 87 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.AstLinkerPass
...
[INFO ] Number of definitions for __libc_start_main: 0
[INFO ] Number of definitions for <operator>.assignment: 4
[INFO ] Calculating reaching definitions for: __gmon_start__ in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for UNKNOWN: 2
[INFO ] Calculating reaching definitions for: _ITM_registerTMCloneTable in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for __gmon_start__: 0
[INFO ] Calculating reaching definitions for: _start in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for _ITM_registerTMCloneTable: 0
[INFO ] Calculating reaching definitions for: register_tm_clones in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Calculating reaching definitions for: __gmon_start__ in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Calculating reaching definitions for: _dl_relocate_static_pie in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for call_weak_fn: 7
[INFO ] Number of definitions for _init: 12
[INFO ] Number of definitions for __libc_start_main: 10
[INFO ] Number of definitions for FUN_004004f0: 12
[INFO ] Number of definitions for _fini: 10
[INFO ] Number of definitions for _start: 21
[INFO ] Number of definitions for __do_global_dtors_aux: 27
[INFO ] Number of definitions for _dl_relocate_static_pie: 10
[INFO ] Number of definitions for main: 16
[INFO ] Number of definitions for __gmon_start__: 9
[INFO ] Number of definitions for abort: 9
[INFO ] Number of definitions for register_tm_clones: 36
[INFO ] Number of definitions for deregister_tm_clones: 25
[INFO ] Calculating reaching definitions for: puts in C:\Users\agemes\git-repos\joern\hello-arm64.gpr
[INFO ] Number of definitions for puts: 10
[INFO ] Pass io.joern.dataflowengineoss.passes.reachingdef.ReachingDefPass completed in 391 ms (1% on mutations). 452 + 0 changes committed from 30 parts.
[INFO ] Start of pass: io.shiftleft.semanticcpg.Overlays$$anon$1
[INFO ] Pass io.shiftleft.semanticcpg.Overlays$$anon$1 completed in 1 ms (70% on mutations). 1 + 0 changes committed from 1 parts.
[INFO ] closing graph: writing to storage at `C:\Users\agemes\git-repos\joern\cpg.bin`
Successfully wrote graph to: C:\Users\agemes\git-repos\joern\cpg.bin
To load the graph, type `joern C:\Users\agemes\git-repos\joern\cpg.bin`

PS C:\Users\agemes\git-repos\joern> joern cpg.bin
Creating project `cpg.bin3` for CPG at `cpg.bin`
Creating working copy of CPG to be safe
Loading base CPG from: C:\Users\agemes\git-repos\joern\workspace\cpg.bin3\cpg.bin.tmp
[INFO ] initialising from existing storage (C:\Users\agemes\git-repos\joern\workspace\cpg.bin3\cpg.bin.tmp)
Overlay dataflowOss already exists - skipping
The graph has been modified. You may want to use the `save` command to persist changes to disk.  All changes will also be saved collectively on exit

     ██╗ ██████╗ ███████╗██████╗ ███╗   ██╗
     ██║██╔═══██╗██╔════╝██╔══██╗████╗  ██║
     ██║██║   ██║█████╗  ██████╔╝██╔██╗ ██║
██   ██║██║   ██║██╔══╝  ██╔══██╗██║╚██╗██║
╚█████╔╝╚██████╔╝███████╗██║  ██║██║ ╚████║
 ╚════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝
Version: 0.0.0+3813-9478888c
Type `help` to begin


joern> cpg.method.name.l
val res1: List[String] = List(
  "_init",
  "FUN_004004f0",
  "__libc_start_main",
  "__gmon_start__",
  "abort",
  "puts",
  "_start",
  "_dl_relocate_static_pie",
  "call_weak_fn",
  "deregister_tm_clones",
  "register_tm_clones",
  "__do_global_dtors_aux",
  "frame_dummy",
  "main",
  "_fini",
  "__libc_start_main",
  "_ITM_deregisterTMCloneTable",
  "__gmon_start__",
  "abort",
  "puts",
  "_ITM_registerTMCloneTable",
  "UNKNOWN",
  "<operator>.assignment",
  "<operator>.addressOf",
  "<operator>.incBy",
  "<operator>.goto",
  "<operator>.compare",
  "<operator>.subtraction",
  "<operator>.logicalShiftRight",
  "<operator>.arithmeticShiftRight"
)

joern>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Ghidra projects can be shared in the .gzf format. That might be a future improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also double-checked this with the Ghidra devs and they confirmed it:

That is a project path, which is OS-indepenent and always uses the forward-slash notation.

@itsacoderepo
Copy link
Contributor

Still, thanks you for the PR!

@gemesa gemesa force-pushed the ghidra-project-load branch from 9478888 to 09b57e1 Compare May 5, 2025 16:03
@gemesa gemesa force-pushed the ghidra-project-load branch from 09b57e1 to 5c41a91 Compare May 6, 2025 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ghidra CPG and renaming
2 participants