An indexer and graph generator for binary build directories.
This tool can be used to quickly search where an ELF symbol matching some criteria is defined in a directory and generate graphs for various queries.
Features:
- Parallel, incremental indexing using sharded Xapian database
- Indexes cross-references by analyzing ELF relocations
- Query the database with a simple query language and custom output formats
- Disassemble symbols matching a query
- Generate symbol reference graphs in DOT format
With pip:
pip install .
Or, for development:
pip install -e .[dev]
For optional graph generation (this installs pygraphviz
):
pip install .[graphs]
xapian is required to be installed on the system.
You need Xapian Python bindings, you can get them:
-
By installing unofficial Xapian bindings Python package with:
pip install xapian-bindings
-
By running the provided install_xapian_bindings.sh script
-
By manually downloading and installing them from Xapian download page
To index a project that contains a compile_commands.json
file:
bdx index -c
Or you can specify the directory to index:
bdx index -d ./build
The indexer will only index files changed since last run.
The index
command also accepts -o
, --opt
option which can be used to set
some indexing settings, e.g. to enable indexing relocations:
bdx index -d ./build --opt index_relocations=True
Available options:
-
num_processes
- number of parallel indexing processes (default=same as # of CPUs). -
index_relocations
- if True, all relocations will be applied and indexed. By default this is false. Setting this to True will slow down indexing. -
min_symbol_size
- (default 1) only index symbols with size equal to or greater than this. -
use_dwarfdump
- if True (the default), usedwarfdump
program, if it's available, to find the source file for a compiled file, if it can't be found in any other way.
After a directory is indexed, you can disassemble symbols matching a search query.
You can set the command to disassemble with the -D
, --disassembler
option,
which can contain {}
placeholders for replacement.
$ bdx disass tree node defp source:./gcc/cp/parser* section:.text
/src/gcc-12/build/gcc/cp/parser.o: file format elf64-x86-64
Disassembly of section .text:
000000000000a1d0 <defparse_location(tree_node*)>:
a1d0: 48 8b 47 08 mov 0x8(%rdi),%rax
a1d4: 48 8b 10 mov (%rax),%rdx
a1d7: 48 8b 40 08 mov 0x8(%rax),%rax
a1db: 8b 7a 04 mov 0x4(%rdx),%edi
a1de: 8b 50 04 mov 0x4(%rax),%edx
a1e1: 89 fe mov %edi,%esi
a1e3: e9 00 00 00 00 jmp a1e8 <defparse_location(tree_node*)+0x18>
bdx search
and other commands accept a query string. A simple query language
is recognized.
$ bdx search -n 5 tree
tree-eh.o: _ZL20outside_finally_tree8treempleP6gimple
hooks.o: _Z14hook_void_treeP9tree_node
tree-eh.o: _ZL22record_in_finally_tree8treempleP4gtry
langhooks.o: _Z20lhd_return_null_treeP9tree_node
langhooks.o: _Z23lhd_tree_dump_dump_treePvP9tree_node
The -n
option sets the maximum number of symbols to search for.
The -f
option can be used to set output format (json
, sexp
or Python string format spec):
$ bdx search -n 5 -f json tree
{"path": "/src/gcc-12/build/stage1-gcc/tree-eh.o", "name": "_ZL20outside_finally_tree8treempleP6gimple", "section": ".text", "address": 12255, "size": 104, "type": "FUNC", "relocations": ["", "_ZN10hash_tableI19finally_tree_hasherLb0E11xcallocatorE4findERKP17finally_tree_node"], "mtime": 1652372105820280262, "demangled": "outside_finally_tree(treemple, gimple*)"}
{"path": "/src/gcc-12/build/prev-gcc/hooks.o", "name": "_Z14hook_void_treeP9tree_node", "section": ".text", "address": 560, "size": 1, "type": "FUNC", "relocations": [], "mtime": 1652375092039025278, "demangled": "hook_void_tree(tree_node*)"}
{"path": "/src/gcc-12/build/gcc/tree-eh.o", "name": "_ZL22record_in_finally_tree8treempleP4gtry", "section": ".text", "address": 13440, "size": 415, "type": "FUNC", "relocations": ["", "_Z11fancy_abortPKciS0_", "_ZN10hash_tableI19finally_tree_hasherLb0E11xcallocatorE6expandEv", "prime_tab", "xmalloc"], "mtime": 1652377778150208461, "demangled": "record_in_finally_tree(treemple, gtry*)"}
{"path": "/src/gcc-12/build/stage1-gcc/langhooks.o", "name": "_Z20lhd_return_null_treeP9tree_node", "section": ".text", "address": 278, "size": 15, "type": "FUNC", "relocations": [], "mtime": 1652372076295950259, "demangled": "lhd_return_null_tree(tree_node*)"}
{"path": "/src/gcc-12/build/stage1-gcc/langhooks.o", "name": "_Z23lhd_tree_dump_dump_treePvP9tree_node", "section": ".text", "address": 1692, "size": 19, "type": "FUNC", "relocations": [], "mtime": 1652372076295950259, "demangled": "lhd_tree_dump_dump_tree(void*, tree_node*)"}
$ bdx search -n 5 -f '0x{address:0>10x}|{section:<10}|{type:8}|{demangled}' tree
0x0000002fdf|.text |FUNC |outside_finally_tree(treemple, gimple*)
0x0000000230|.text |FUNC |hook_void_tree(tree_node*)
0x0000003480|.text |FUNC |record_in_finally_tree(treemple, gtry*)
0x0000000116|.text |FUNC |lhd_return_null_tree(tree_node*)
0x000000069c|.text |FUNC |lhd_tree_dump_dump_tree(void*, tree_node*)
-
Search for symbols having
foo
ANDbar
somewhere in their name:bdx search foo AND bar
or:
bdx search foo bar
-
Search for symbols having either
foo
orbar
in their name:bdx search foo OR bar
-
Search for symbols named exactly
foo
:bdx search fullname:foo
-
Search for symbols where Elf ST_INFO type is
STT_FUNC
orSTT_OBJECT
:bdx search type:FUNC OR type:OBJECT
-
Search for symbols
foo*
in binary files namedbar.o
:bdx search 'name:foo*' path:bar.o
-
Search for symbols in files compiled from source file named
file.c
:bdx search source:file.c
-
Search for symbols
foo
orbar
that are not mangled (_Z*
prefix):bdx search '(foo OR bar)' AND NOT name:_Z*
-
Search for symbols that reference/call
memset
:bdx search relocations:memset
-
Search for symbols that call
malloc
, but notfree
:bdx search relocations:malloc NOT relocations:free
-
Search for symbols with size in some range, where address is at least 0xfff0:
bdx search foo size:100..200 address:0xfff0..
-
Search for symbols by relative path of the binary:
bdx search 'path:./build/module/*'
-
Search for string literals:
bdx search 'path:"/path/to/File With Spaces.o"'
-
Search for big symbols in some section:
bdx search section:.rodata AND size:1000..
Generate an SVG image showing at most 20 routes from symbol main
in
main.o
to all symbols in section .text
in files matching wildcard
Algorithms_*
:
bdx graph 'main path:main.o' 'section:".text" AND path:Algorithms*' -n 20 | dot -Tsvg > graph.svg
By default this generates paths by using the ASTAR algorithm, the --algorithm BFS
or --algorithm DFS
options will use
breadth-first-search/depth-first-search algorithms which can generate different
graphs and can be slower/faster depending on the index and the queries
provided.
Copyright (C) 2024 Michał Krzywkowski
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.