Directed fuzzing for xlnt project with sydr‐fuzz (LibAFL‐DiFuzz backend)

Introduction

In this article, we will look at approach to directed fuzzing using the Sydr-Fuzz interface based on the LibAFL-DiFuzz fuzzer. Sydr-Fuzz provides a convenient interface for running hybrid fuzzing, leveraging the dynamic symbolic execution capabilities of the Sydr tool combined with modern fuzzers. In addition to fuzzing, Sydr-Fuzz offers a set of capabilities for corpus minimization, coverage collection, finding bugs by checking security predicates, and crash analysis using Casr.

A new stage in the development of Sydr-Fuzz was the integration of directed fuzzing based on the LibAFL-DiFuzz tool. Directed fuzzing allows analysis to be focused on specific points in code and is suitable for more targeted analysis of individual areas. LibAFL-DiFuzz is based on the modular architecture of the LibAFL library and allows you to set the “direction” of the analysis using one or several target points in code. To perform directed fuzzing, the tool requires a static preprocessing of the program, which allows it to build special metrics that are used later in the analysis process. During fuzzing, LibAFL-DiFuzz tracks the current state and schedules the energy of the inputs, increasing the probability of approaching target points.

To demonstrate the capabilities of hybrid directed fuzzing with Sydr-Fuzz, we will use the xlnt library.

Preparing the fuzzing target

The build is performed by cargo make command. It automatically compiles all fuzzing targets and make neccessary preprocessing for directed fuzzing.

First, let's prepare the fuzzing target. This concept for directed fuzzing differs slightly from the standard understanding of a fuzzing target. The task of directed fuzzing—to ensure that specified points in the code are reached—requires a common entry point into the program from which these points must be reachable. Usually, the main function is taken as the entry point, since it can be used to reach any point in the program. However, in the case of a library, it is necessary to write a wrapper for one or more of its functions, taking into account the reachability of the target points.

We should prepare several files for directed fuzzing:

Fuzzing target code
Build script for fuzzing target
Makefile.toml for build
config.toml with target points

For our example, the load function and the wrapper used in this guide will work just fine. We will need a build with the main function, which is reflected in the corresponding build script. The buiild script is an executable file that used by Makefile.toml to compile fuzzing target with different instrumentation (for LibAFL-DiFuzz fuzzing, for Sydr symbolic execution, for Casr crash analysis, for coverage). Hence, the build script should: walk into a project repository directory, cleanup all previous build artifacts, prepare and build fuzzing target. Makefile.toml uses CC/CXX/CFLAGS/CXXFLAGS envs for different instrumentation, so build script shouldn't overwrite them. Examples of different build scripts: xlnt, cxxfilt.

Special Makefile.toml allows to build all fuzzing targets automatically and perform required static project prerpocessing for directed fuzzing (CG/CFG graphs and ETS construction). Makefile.toml could be generated from template with gen_target.py(sydr/difuzz/template/gen_target.py) script, specifying the values of arguments specific to the fuzzing target:

-p/--project: project name
-s/--script: path to the script for building the fuzzing target
-t/--target-dir: path to the directory with the project source code
-m/--main-path: path to the file with the implementation of the main function relative to target-dir (or absolute path)
-b/--bin-path: path to the project binary file resulting from the build, relative to target-dir
-a/--bin-args: arguments for running the binary file (with “@@” instead of the input file name)
-c/--config-dir (optional): directory for generated files ("." by default)
-r/--rep-clone: (optional) bash command for cloning the project repository
-v/--version: (optional) bash command to jump to a specific commit in the project
--mode: build mode (debug/release), release by default
--root: name of the function that is the entry point to the program (usually main)
-l/--lang: language of the target program (c/rust/go)

For xlnt project all neccessary files have already built and placed in directed_target subdirectory. But it also could be generated with gen_target.py. For example, for xlnt you could run the script with the following command:

$ python3 gen_target.py -p xlnt -s directed_target/build_libafl_load.sh -r "git clone https://github.com/tfussell/xlnt" -v "git checkout 3a279fcaab3432bb851c7976d4591f9505c3462a" \
    -t xlnt -m /opt/StandaloneFuzzTargetMain.c -b build/load_libafl -a "@@" -c . --mode release --root main -l c

As a result, there will be 3 files in specified by -c directory: Makefile.toml, build script, and config.toml. The resulting Makefile.toml defines build targets for Sydr symbolic execution (debug), LibAFL-DiFuzz fuzzing (target), as well as additional build targets for coverage analysis (coverage) and crash analysis (casr). It also neccessary to go through all Makefile.toml manually and fix possible errors:

check all paths in [env], especially path for LibAFL-DiFuzz tools (DIFUZZ_DIR).
set required flags for building targets in [tasks.debug_unix], [tasks.casr_unix], [tasks.coverage_unix], and [tasks.target_unix].

The last step of fuzzing targets preparation is specifying target points. When launching the usual hybrid fuzzing for xlnt, several crashes were found — let's take some of them as target points. To do this, we add them as a list to the following configuration file config.toml:

[[target]]
file = "/xlnt/source/detail/cryptography/compound_document.cpp"
line = 975

[[target]]
file = "/xlnt/source/detail/cryptography/compound_document.cpp"
line = 723

[[target]]
file = "/xlnt/source/detail/cryptography/compound_document.cpp"
line = 126

[[target]]
file = "/xlnt/source/detail/serialization/xlsx_consumer.cpp"
line = 2031

[[target]]
file = "/xlnt/source/detail/serialization/zstream.cpp"
line = 269

[[target]]
file = "/xlnt/source/utils/path.cpp"
line = 185

[[target]]
file = "/xlnt/source/worksheet/worksheet.cpp"
line = 1086

Building the fuzzing target

Let's move on to building the fuzzing target. With OUT_DIR=/ cargo make all command the one could automatically build debug, target, coverage, and casr targets from Makefile.toml. OUT_DIR environment variable allows to specify the directory to save built binaries. It worth mentioning that LibAFL-DiFuzz and coverage targets requires source code patching, that is also done automatically with insert_forkserver.py script at Makefile.toml. Hence, between the cargo make all runs, for example at debugging, all changes in source code repository (particularly in main_path) should be rolled back manually,

All the necessary environment will be installed in a Docker image, which will be built according to the corresponding Dockerfile_libafl:

ARG BASE_IMAGE="sydr/ubuntu22.04-sydr-fuzz"
FROM $BASE_IMAGE

ARG SYDR_ARCHIVE="./sydr.zip"

WORKDIR /

# Clone target from GitHub.
RUN git clone https://github.com/tfussell/xlnt

WORKDIR /xlnt

# Checkout specified commit. It could be updated later.
RUN git checkout 3a279fcaab3432bb851c7976d4591f9505c3462a && git submodule update --init --recursive

# Copy build script and targets.
COPY save.cc load.cc ./

# Copy LibAFL-DiFuzz target template.
COPY directed_target /directed_target

WORKDIR /directed_target

# Build xlnt for LibAFL-DiFuzz.
ADD ${SYDR_ARCHIVE} ./
RUN unzip -o ${SYDR_ARCHIVE} && rm ${SYDR_ARCHIVE}
RUN OUT_DIR=/ cargo make all

# Prepare seed corpus.
RUN mkdir /corpus && find /xlnt -name "*.xlsx" | xargs -I {} cp {} /corpus
RUN cp -r /corpus /save_corpus
RUN for file in /save_corpus/*; do sed -i '1s/^/\x00\x05/' $file; done

Please note that to build Docker, you will need the sydr.zip archive containing the binary files and libraries required for LibAFL-DiFuzz to work. Let's build the image using the command:

$ sudo docker build --build-arg SYDR_ARCHIVE="sydr.zip" -t oss-sydr-fuzz-libafl-xlnt -f ./Dockerfile_libafl .

Building for LibAFL-DiFuzz

Let's take a closer look at how LibAFL-DiFuzz targets are built in Makefile.toml. Building the program directly for the LibAFL-DiFuzz fuzzer is done in several stages. First, a preliminary build is performed with the addition of debug information using the wllvm/wllvm++ compilers. Next, the resulting binary file is statically analyzed using the DiFuzz tool. To run static analysis of the program, you need to specify the path to the config.toml configuration file and to the program binary file obtained as a result of the wllvm/wllvm++ build, as well as arguments specific to the analysis. These steps are described by the difuzz goal in the Makefile.toml file:

[tasks.difuzz_unix]
script_runner = "@shell"
script = '''
cd ${PROJECT_DIR}
export LLVM_COMPILER=clang
export CC=wllvm; export CXX=wllvm++
export CFLAGS="-g -fsanitize=address,integer,bounds,null,undefined,float-divide-by-zero"; export CXXFLAGS="$CFLAGS"
python3 ${DIFUZZ_DIR_ABS}/insert_forkserver.py -a insert -l c -f /opt/StandaloneFuzzTargetMain.c
python3 ${DIFUZZ_DIR_ABS}/insert_forkserver.py -a comment -l c -f /opt/StandaloneFuzzTargetMain.c
cd ${OUT_DIR_ABS}
${PROJECT_DIR}/build_libafl_load.sh
${DIFUZZ_DIR_ABS}/difuzz -c ${PROJECT_DIR}/config.toml -b ${EXAMPLE_DIR}/build/load_libafl -e ${OUT_DIR_ABS}/ets_load.toml ${DIFUZZ_ARGS}
${PROJECT_DIR}/build_libafl_save.sh
${DIFUZZ_DIR_ABS}/difuzz -c ${PROJECT_DIR}/config.toml -b ${EXAMPLE_DIR}/build/save_libafl -e ${OUT_DIR_ABS}/ets_save.toml ${DIFUZZ_ARGS}
'''

When running the difuzz target, we will see the following output from the DiFuzz tool:

After the static analysis stage, an auxiliary configuration file ets.toml is created, and the call graph and CFG for the target functions of the program are saved in DOT format along with their dominator trees. The ets.toml file is used when recompiling the program with the libafl_cc/libafl_cxx instrumenting compilers. To build, you need to add a call to the fuzzing initialization function to the main function code using the insert_forkserver.py script, as well as process ets.toml using a special manager ETS_SHARED_MANAGER (provides parallel compilation of modules using shared memory). These actions are described by the target goal:

[tasks.target_unix]
script_runner = "@shell"
script = '''
cd ${PROJECT_DIR}
export CC=${LIBAFL_CC}
export CXX=${LIBAFL_CXX}
export CFLAGS="-g -fsanitize=address,integer,bounds,null,undefined,float-divide-by-zero"; export CXXFLAGS="$CFLAGS"
${ETS_SHARED_MANAGER} -a remove -n xlnt_load
${ETS_SHARED_MANAGER} -a create -n xlnt_load
${ETS_SHARED_MANAGER} -a parse -n xlnt_load -i ${OUT_DIR_ABS}/ets_load.toml
python3 ${DIFUZZ_DIR_ABS}/insert_forkserver.py -a uncomment -l c -f /opt/StandaloneFuzzTargetMain.c
export LIBAFL_SHARED_NAME="xlnt_load"
${PROJECT_DIR}/build_libafl_load.sh
mv ${EXAMPLE_DIR}/build/load_libafl ${OUT_DIR_ABS}/load_libafl
${ETS_SHARED_MANAGER} -a dump -n xlnt_load -o ${OUT_DIR_ABS}/ets_load.toml
${ETS_SHARED_MANAGER} -a remove -n xlnt_load
python3 ${DIFUZZ_DIR_ABS}/insert_forkserver.py -a remove -l c -f /opt/StandaloneFuzzTargetMain.c
'''
dependencies = ["difuzz"]

When compiling, we will see the following logs (by setting the value of the variable LIBAFL_DEBUG_PASS=2):

After building these targets, as well as the debug target (and, if necessary, coverage, casr), the program build for directed fuzzing is ready.

Fuzzing

To run hybrid directed fuzzing via Sydr-Fuzz, you need to create a configuration file load_libafl.toml. The file must contain:

the exit-on-time parameter, which sets the time until fuzzing is stopped if there is no new coverage,
the [sydr] table specifying the Sydr arguments (args, jobs) and the target program launch string (target),
the [difuzz] table specifying the path to the LibAFL-DiFuzz fuzzer (path), its arguments (args), the target program launch string (target), and (if necessary) the path to the program binary file compiled for analysis by the Casr tool (casr_bin),
and (if necessary) the [cov] table specifying the target program launch string (target) for coverage collection.

We get the following configuration file for the load wrapper:

exit-on-time = 7200

[sydr]
args = "--wait-jobs -s 90 -j2"
target = "/load_sydr @@"
jobs = 2

[difuzz]
path = "/directed_target/sydr/difuzz/libafl_difuzz"
target = "/load_libafl @@"
args = "-j4 -l64 -i /corpus -e /ets_load.toml"
casr_bin = "/load_casr"

[cov]
target = "/load_cov @@"

There are several libafl_difuzz options at arguments field in this configuration file:

-j4: number of parallel jobs
-l64: stack size limit (Gb)
-i /corpus: path to input corpus
-e /ets.toml: path to ETS built by difuzz.

More info about LibAFL-DiFuzz settings at docs.

Let's run hybrid directed fuzzing with Sydr-Fuzz for the load wrapper, specifying the path to load_libafl.toml:

$ sydr-fuzz -c ./load-libafl.toml run

At the beginning of fuzzing, you can see how LibAFL-DiFuzz processes start and begin sending statistics on the current state:

After a while, information about reaching the target points specified in config.toml appears in the logs:

The time of reaching each point is indicated. Some points may be reached several times, which is normal, since input mutations can result in multiple files with similar execution paths.

During the fuzzing process, you may also see logs like this:

These logs show the number of files that were imported by Sydr. These files proved useful for one of two reasons: either they help in reaching target points, or they open up new coverage of the target program.

At the end of the analysis, general statistics on all fuzzer processes are displayed, as well as a list of all target points achieved:

Here, the minimum time to reach each point is displayed — these measurements can be used to estimate the Time to Exposure (TTE) metric.

Results analysis

The results of directed fuzzing are objective inputs that lead to at least one of three states:

program crash,
reaching the target point,
program hang (timeout).

At the end of fuzzing, the results are minimized and sorted. When minimizing, only one file with identical characteristics for reaching the target points is saved from all files — the file that was generated first. This allows the TTE metric to be measured on a minimized set of objectives. The xmin configuration option allows minimization to be disabled by setting it to false. This can be useful in cases where all objective files need to be saved.

The files remaining after minimization are renamed according to their impact on the target program and sorted by the target points reached. As a result of sorting, separate cluster directories are formed in the load-libafl-out directory for each of the target points. If an objective leads to the achievement of two target points at once, it will be added to both clusters. The cluster name contains the location of the target point:

Here we see that as a result of running hybrid directed fuzzing, 6 target points were reached. At the same time, for example, for compound_document.cpp:125, 2 crashes, 1 objective reaching the point without a crash, and 2 timeouts were found.

Coverage collection

For directed fuzzing, coverage is collected not only from the final corpus files, but also from all objective files remaining after minimization. This also allows us to see the coverage of the code containing the target points.

Let's use the following command:

$ sydr-fuzz -c load-libafl.toml cov-html

The HTML report shows, for example, that the point compound_document.cpp:125 was actually covered:

Crash reports analysis

The following command can be used to analyze crash reports obtained after running fuzzing using Casr:

$ sydr-fuzz -c load-libafl.toml casr

Here, only those objectives that lead to program crashes are considered. As a result of running Casr, we get a separate cluster hierarchy in the load-libafl-out/casr directory with six clusters, one of which contains a PROBABLY_EXPLOITABLE crash:

Let's look at the report for this crash using the command:

$ casr-cli load-libafl-out/casr/cl1/crash-4b9fda7296ce518d-10.casrep

We see part of the ASAN report and several lines of code around the error — it looks like an incorrectly calculated write address. Collecting such reports greatly simplifies the analysis of fuzzing results and saves time.

Conclusion

This article discussed an approach to hybrid directed fuzzing based on the LibAFL-DiFuzz fuzzer and the Sydr symbolic interpreter using the Sydr-Fuzz interface. Directed fuzzing requires special preparation of the target program, which we discussed in detail using the xlnt project as an example. However, fuzzing itself, followed by minimization and sorting of results, coverage collection, and crash analysis using Casr, can be easily and conveniently run using Sydr-Fuzz.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Directed fuzzing for xlnt project with sydr‐fuzz (LibAFL‐DiFuzz backend)

Introduction

Preparing the fuzzing target

Building the fuzzing target

Building for LibAFL-DiFuzz

Fuzzing

Results analysis

Coverage collection

Crash reports analysis

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally