Skip to content

Fuzzing ruamel yaml (Python) project with sydr fuzz

Daniel Kuts edited this page Nov 17, 2025 · 7 revisions

Introduction

In this article I'll share my experience of fuzzing Python projects. For this purpose I use sydr-fuzz with Atheris and python-afl backends. Sydr-fuzz was originally designed as a hybrid fuzzer that combines Sydr (DSE tool) and top world fuzzers like AFLplusplus and libFuzzer. Also, sydr-fuzz supports some useful features like crash triage by casr, ability to check security predicates, and some convenient subcommands for corpus minimization and code coverage collection. Atheris and python-afl is a coverage-guided Python fuzzing engines. They support fuzzing of Python code and also native extensions written for CPython. Atheris is based on libFuzzer, while python-afl - on AFL++. They look and work like libFuzzer and AFL++, so we decided to support it in sydr-fuzz, why not? Though we don't have symbolic execution for Python code but we still could do fuzzing, crash triage, corpus minimization, and coverage collection using sydr-fuzz interface.

Preparing Fuzz Target

Atheris github page has a nice instruction about installing and using it. We will fuzz yaml project from it's examples. There is a docker container already prepared for building with all needed fuzzing environment. I'll use it for my fuzzing experiments, but for now let's look more precisely at fuzz target and build script.

import atheris

with atheris.instrument_imports():
  from ruamel import yaml as ruamel_yaml
  import sys
  import warnings

# Suppress all warnings.
warnings.simplefilter("ignore")

ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
ryaml.allow_duplicate_keys = True


@atheris.instrument_func
def TestOneInput(input_bytes):
  fdp = atheris.FuzzedDataProvider(input_bytes)
  data = fdp.ConsumeUnicode(sys.maxsize)

  try:
    iterator = ryaml.load_all(data)
    for _ in iterator:
      pass
  except ruamel_yaml.error.YAMLError:
    return

  except Exception:
    input_type = str(type(data))
    codepoints = [hex(ord(x)) for x in data]
    sys.stderr.write(
        "Input was {input_type}: {data}\nCodepoints: {codepoints}".format(
            input_type=input_type, data=data, codepoints=codepoints))
    raise


def main():
  atheris.Setup(sys.argv, TestOneInput)
  atheris.Fuzz()


if __name__ == "__main__":
  main()

For Atheris, we define which modules we want to instrument. There is also an ability to instrument everything: atheris.instrument_all(). It might be useful when your project has many dependencies. Then we need to implement def TestOneInput(input_bytes), this is similar to int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) for C/C++. It is important to catch exceptions that are thrown by the target function. But you need to catch only those exceptions that are specified by developers or that the function throws directly. For example, IndexError doesn't need to be caught if it is not specified in documentation. Both Atheris and python-afl catches it as a crash.

For python-afl fuzzing target looks a little bit different. You can read more about it here and here.

#!/usr/bin/python3
import afl, sys, os
from ruamel import yaml as ruamel_yaml
import warnings

# Suppress all warnings.
warnings.simplefilter("ignore")

def _ConsumeString(res_len, data):
  ...

def TestOneInput(input_bytes):
  ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
  ryaml.allow_duplicate_keys = True
  data = _ConsumeString(sys.maxsize, input_bytes)

  try:
    iterator = ryaml.load_all(data)
    for _ in iterator:
      pass
  except ruamel_yaml.error.YAMLError:
    return

  except Exception:
    input_type = str(type(data))
    codepoints = [hex(ord(x)) for x in data]
    sys.stderr.write(
        "Input was {input_type}: {data}\nCodepoints: {codepoints}".format(
            input_type=input_type, data=data, codepoints=codepoints))
    raise


def main():
  # выбираем откуда считывать stdin
  try:
    # Python 3:
    stdin_compat = sys.stdin.buffer
  except AttributeError:
    # There is no buffer attribute in Python 2:
    stdin_compat = sys.stdin

  while afl.loop(10000): 
    TestOneInput(stdin_compat.read())
    sys.stdin.seek(0) # очистка stdin между запусками
  os._exit(0)

if __name__ == "__main__":
  main()

One of the main differences is that python-afl lacks of FuzzedDataProvider class for converting input bytes into a certain type expected by fuzzing target. Also, you might wonder why the shebang (#!/usr/bin/python3) was changed - this is not a coincidence, python-afl with ASAN is likely to encounter many errors like [CMIN] du: cannot access ... or Fork server handshake failed ... when using shebang with env. The fix (assuming everything else is configured correctly) is to use the shebang without env - so be careful when using python-afl.

Similarly, a nasty issue with artifacts like hanging cmin/coverage can happen on some systems when using LD_PRELOAD - there is a workaround, but the easy way is to switch to python-afl.

At last, we need to write some code to start fuzzing process.

For Atheris:

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

For python-afl with persistence mode:

while afl.loop(10000):
  TestOneInput(stdin_compat.read())
  sys.stdin.seek(0) # очистка stdin между запусками
os._exit(0)

For python-afl without persistence mode:

afl.init()
TestOneInput(stdin_compat.read())
os._exit(0)

Using of persistence mode is highly recommended as it works several times faster than default fuzzing mode, but the presence of global/static variables can affect the fuzzing process. It is also important to mention, that persistence mode fails to analyze targets with afl-instrumented code (using afl-clang for C-code dependencies, for example). In these cases the use of default fuzzing mode is justified. Or, if you want, you can use a trick and import afl-instrumented libraries inside TestOneInput(). This way we could use persistence mode by the cost of reduced stability (and potentially incorrect fuzzing sometimes), but it is still much faster the default fuzzing mode. We don't have any C-code dependencies inside ruamel project, but just to demonstrate how the trick works:

def TestOneInput(input_bytes):
  # тот самый импорт, о котором шла речь
  from ruamel import yaml as ruamel_yaml
  ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
  ryaml.allow_duplicate_keys = True
  # ... идентичный код
while afl.loop(10000):
  TestOneInput(stdin_compat.read())
  sys.stdin.seek(0)
os._exit(0)

As for build, we could just use pip install . from project directory and install instrumented project in our fuzzing environment. Ok, let's build docker container and start fuzzing!

Building libraries containing C/C++ for python-afl may present some challenges. Perhaps the build examples from our supported targets will help. At the time of writing, these are msgspec and ultrajson.

Configuration

Before we begin, let's look at yaml_fuzzer-atheris.toml:

exit-on-time = 3600

[atheris]
path = "/yaml_fuzzer-atheris.py"
args = "/corpus -dict=yaml.dict -jobs=1000 -workers=4"

And also at yaml_fuzzer-pyafl.toml:

exit-on-time = 3600
[pyafl]
args = "-i /corpus"
target = "/fuzz/yaml_fuzzer_pyafl.py"
jobs = 4

It's pretty simple.

exit-on-time - is an optional parameter that takes time in seconds. If during this time (1 hour in our case) the coverage does not increase, fuzzing is automatically terminated.

Fuzzing

Atheris

I'll use 4 workers for fuzzing till 1000 crashes are found or exit-on-time is triggered. Let's start fuzzing with this command:

# sydr-fuzz -c yaml_fuzzer-atheris.toml run
[2023-01-11 17:22:47] [INFO] #3582      RELOAD cov: 1178 ft: 5252 corp: 478/64Kb lim: 487 exec/s: 275 rss: 713Mb                                                     
[2023-01-11 17:22:48] [INFO] Uncaught Python exception: KeyError: (0, 1) /fuzz/yaml_fuzzer-out/crashes/crash-a0acd109aef7675ce2268eec4e0901759f4e1edc                      
[2023-01-11 17:22:50] [INFO] #17540     REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 343 rss: 677Mb L: 13/481 MS: 2 CrossOver-EraseBytes-                                                     
[2023-01-11 17:22:50] [INFO] #17573     REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 344 rss: 677Mb L: 58/481 MS: 3 ChangeBit-ManualDict-EraseBytes- DE: "'"- 
[2023-01-11 17:22:50] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-4230d57dcf9dce49804ffd9abbc43a751068c6a2                                                          
[2023-01-11 17:22:55] [INFO] #1024      pulse  cov: 1171 ft: 4926 corp: 413/36Kb exec/s: 204 rss: 710Mb                                                                                                            
[2023-01-11 17:22:55] [INFO] [ATHERIS]         run time : 0 days, 0 hrs, 0 min, 57 sec                                                                                                                             
[2023-01-11 17:22:55] [INFO] [ATHERIS]    last new find : 0 days, 0 hrs, 0 min, 8 sec                                                                                                                              
[2023-01-11 17:22:57] [INFO] #1268      INITED cov: 1178 ft: 5265 corp: 477/60Kb exec/s: 181 rss: 710Mb

After some amount of time we have found some crashes. Let's wait till fuzzing is finished.

[2023-01-11 19:21:27] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-1f71bdb8cbba856923a45f50c7873bdb7ef64e2d
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-c3152f019e73c4c01925ae1533f47583fe3006df
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-fed9747bec6197c9c8cbc0cf10c051c8f807d407
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 0) /fuzz/yaml_fuzzer-out/crashes/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 19:21:41] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing corpus is saved in /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 19:21:42] [INFO] [RESULTS] oom/leak/timeout/crash: 0/0/0/407
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing results are saved in /fuzz/yaml_fuzzer-out/crashes

Nice, our fuzzing experiment is ended by exit-on-time. We've got 407 crashes to analyze! This is a job for casr.

Let's minimize corpus first:

# sydr-fuzz -c yaml_fuzzer-atheris.toml cmin
[2023-01-11 20:30:08] [INFO] Original fuzzing corpus saved as /fuzz/yaml_fuzzer-out/corpus-old
[2023-01-11 20:30:08] [INFO] Minimizing corpus /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:30:08] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:30:08] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:30:08] [INFO] Launching atheris: "/yaml_fuzzer.py" "-merge=1" "-artifact_prefix=/fuzz/yaml_fuzzer-out/crashes/" "-close_fd_mask=2" "-verbosity=2" "-detect_leaks=0" "-dict=/fuzz/yaml.dict" "/fuzz/yaml_fuzzer-out/corpus" "/fuzz/yaml_fuzzer-out/corpus-old"
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: 8719 files, 0 in the initial corpus, 0 processed earlier
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: attempt 1
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: successful in 1 attempt(s)
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: the control file has 982127 bytes
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: consumed 0Mb (120Mb rss) to parse the control file
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: 913 new files with 7301 new features added; 1249 new coverage edges

We've narrowed 8719 files to 913 files, nice!

python-afl

Same step for fuzzing with python-afl:

# sydr-fuzz -c yaml_fuzzer-pyafl.toml run
[2025-08-15 14:05:47] [INFO] [AFL++] [*] Attempting dry run with 'id:000009,time:0,execs:0,orig:yaml-version.yaml'...
[2025-08-15 14:05:47] [INFO] [AFL++]     len = 24, map size = 1570, exec speed = 5423 us, hash = 26807c104c8188ff
[2025-08-15 14:05:47] [INFO] [AFL++] [!] WARNING: Instrumentation output varies across runs.
[2025-08-15 14:05:47] [INFO] [AFL++] [+] All test cases processed.
[2025-08-15 14:05:47] [INFO] [AFL++] [!] WARNING: The target binary is pretty slow! See /usr/local/share/doc/afl/fuzzing_in_depth.md#i-improve-the-speed
[2025-08-15 14:05:47] [INFO] [AFL++] [+] Here are some useful stats:
[2025-08-15 14:05:47] [INFO] [AFL++]     Test case count : 8 favored, 4 variable, 0 ignored, 10 total
[2025-08-15 14:05:47] [INFO] [AFL++]        Bitmap range : 1160 to 2057 bits (average: 1525.50 bits)
[2025-08-15 14:05:47] [INFO] [AFL++]         Exec timing : 2685 to 48.2k us (average: 12.7k us)
[2025-08-15 14:05:47] [INFO] [AFL++] 
[2025-08-15 14:05:47] [INFO] [AFL++] [*] -t option specified. We'll use an exec timeout of 2000 ms.
[2025-08-15 14:05:47] [INFO] [AFL++] [+] All set and ready to roll!
[2025-08-15 14:05:47] [INFO] [AFL++] [*] Entering queue cycle 1
[2025-08-15 14:05:47] [INFO] [AFL++] 
[2025-08-15 14:05:47] [INFO] [AFL++] [*] Fuzzing test case #1 (10 total, 0 crashes saved, state: started :-), mode=explore, perf_score=100, weight=1, favorite=1, was_fuzzed=0, exec_us=2685, hits=0, map=1360, ascii=0, run_time=0:00:00:00)...
[2025-08-15 14:06:16] [INFO] Found crash /fuzz/yaml_fuzzer_pyafl-out/crashes/crash-bb4d05f035d758b66070c9250124988e81a168d7

After some time we found several crashes. Let's wait till fuzzing is finished.

[2025-08-17 09:45:51] [INFO] [AFL++] +++ Testing aborted programmatically +++
[2025-08-17 09:45:51] [INFO] [AFL++] [!] 
[2025-08-17 09:45:51] [INFO] [AFL++] Performing final sync, this make take some time ...
[2025-08-17 09:45:51] [INFO] [AFL++] [!] Done!
[2025-08-17 09:45:51] [INFO] [AFL++] [*] Writing /fuzz/yaml_fuzzer_pyafl-out/aflplusplus/afl_main-worker/fastresume.bin ...
[2025-08-17 09:45:51] [INFO] [AFL++] [+] Written fastresume.bin with 2431192 bytes!
[2025-08-17 09:45:51] [INFO] [AFL++] [+] We're done here. Have a nice day!
[2025-08-17 09:45:51] [INFO] [AFL++] 
[2025-08-17 09:45:51] [INFO] [RESULTS] Fuzzing corpuses are saved in workers queue directories. Run sydr-fuzz cmin subcommand to gather full corpus at "/fuzz/yaml_fuzzer_pyafl-out/corpus-old" and minimized corpus at "/fuzz/yaml_fuzzer_pyafl-out/corpus".
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_main] 1675 new corpus items found, 9.46% coverage achieved, 197 crashes saved, 131 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 53 sec
[2025-08-17 09:45:51] [INFO] [RESULTS]            execs done: 27544322, execs/s: 193.14, edges found: 6202/65536, stability: 66.01%
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_s01] 1422 new corpus items found, 9.35% coverage achieved, 168 crashes saved, 128 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 57 sec
[2025-08-17 09:45:51] [INFO] [RESULTS]           execs done: 15183330, execs/s: 106.46, edges found: 6126/65536, stability: 67.37%
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_s02] 1470 new corpus items found, 9.50% coverage achieved, 218 crashes saved, 111 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 58 sec
[2025-08-17 09:45:51] [INFO] [RESULTS]           execs done: 16792546, execs/s: 117.74, edges found: 6226/65536, stability: 83.99%
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_s03] 1359 new corpus items found, 9.41% coverage achieved, 203 crashes saved, 118 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 55 sec
[2025-08-17 09:45:51] [INFO] [RESULTS]           execs done: 16465514, execs/s: 115.45, edges found: 6168/65536, stability: 66.97%
[2025-08-17 09:45:51] [INFO] [RESULTS] timeout/crash: 488/785

Nice, we've found 785 crashes! Corpus minimization is mandatory for python-afl because AFL++ have its fuzzing corpus scattered between workers. Sydr-fuzz cmin command allows to gather them all in one directory and minimize it:

# sydr-fuzz -c yaml_fuzzer-pyafl.toml cmin
[2025-08-18 11:42:24] [INFO] [CMIN] [+] Found 28239 unique tuples across 10970 files.
[2025-08-18 11:42:24] [INFO] [CMIN] [+] Narrowed down to 734 files, saved in '/fuzz/yaml_fuzzer_pyafl-out/corpus'.

We've narrowed 10970 files to 734 files, nice! Now we can collect the code coverage!

Coverage

The coverage is collected the same way for both fuzzers. For code coverage we use well-known coverage python module and this instruction from Atheris GitHub. Of course, we've wrapped it into sydr-fuzz pycov subcommand. Let's get html coverage report:

# sydr-fuzz -c yaml_fuzzer-atheris.toml pycov html
[2023-01-11 20:37:47] [INFO] Running pycov html "/fuzz/yaml_fuzzer.toml"
[2023-01-11 20:37:47] [INFO] Collecting coverage data for each file in corpus: /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:37:47] [INFO] Saving coverage data to /fuzz/yaml_fuzzer-out/coverage/html/.coverage
[2023-01-11 20:37:47] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:37:47] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:37:47] [INFO] Collecting coverage: "coverage" "run" "/yaml_fuzzer.py" "-atheris_runs=914"
[2023-01-11 20:37:51] [INFO] Running coverage html: "coverage" "html" "-d" "/fuzz/yaml_fuzzer-out/coverage/html" "--data-file=/fuzz/yaml_fuzzer-out/coverage/html/.coverage"
Wrote HTML report to /fuzz/yaml_fuzzer-out/coverage/html/index.html

Good, we've got the coverage, let's look at it and move on further!

cov-html

Exactly the same way we collect coverage for python-afl (now in console-output, for example):

# sydr-fuzz -c yaml_fuzzer-pyafl.toml pycov report
[2025-09-29 16:04:48] [INFO] Configured to run with AFL_PRELOAD SET
[2025-09-29 16:04:48] [INFO] Running pycov report "/fuzz/yaml_fuzzer_pyafl.toml"
[2025-09-29 16:04:48] [INFO] Collecting coverage data for each file in corpus: /fuzz/yaml_fuzzer_pyafl-out/corpus
[2025-09-29 16:04:48] [INFO] Saving coverage data to /fuzz/yaml_fuzzer_pyafl-out/coverage/report/.coverage
[2025-09-29 16:04:48] [INFO] pycov environment: AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 AFL_SYNC_TIME=1 ASAN_OPTIONS=abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=2048,malloc_context_size=0,symbolize=0,verify_asan_link_order=0 PYTHON_AFL_PERSISTENT=1 UBSAN_OPTIONS=abort_on_error=0,allocator_may_return_null=1,halt_on_error=0,malloc_context_size=0
[2025-09-29 16:04:48] [INFO] Collecting coverage: cd "/fuzz/yaml_fuzzer_pyafl-out/coverage/report" && AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES="1" AFL_SYNC_TIME="1" ASAN_OPTIONS="abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=2048,malloc_context_size=0,symbolize=0,verify_asan_link_order=0" PYTHON_AFL_PERSISTENT="1" UBSAN_OPTIONS="abort_on_error=0,allocator_may_return_null=1,halt_on_error=0,malloc_context_size=0" "/bin/bash" "-c" "coverage run --include=*/site-packages/*,*/dist-packages/*,/fuzz/yaml_fuzzer_pyafl.py --omit=*/coverage/* /fuzz/yaml_fuzzer_pyafl-out/coverage/pyAflCovWrapper.py /fuzz/yaml_fuzzer_pyafl-out/corpus"
[2025-09-29 16:04:49] [INFO] Running coverage report: AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES="1" AFL_SYNC_TIME="1" ASAN_OPTIONS="abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=2048,malloc_context_size=0,symbolize=0,verify_asan_link_order=0" PYTHON_AFL_PERSISTENT="1" UBSAN_OPTIONS="abort_on_error=0,allocator_may_return_null=1,halt_on_error=0,malloc_context_size=0" "coverage" "report" "--data-file=/fuzz/yaml_fuzzer_pyafl-out/coverage/report/.coverage" "--ignore-errors"
Name                                                                 Stmts   Miss  Cover
----------------------------------------------------------------------------------------
/usr/local/lib/python3.9/dist-packages/_distutils_hack/__init__.py     101     96     5%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/__init__.py           9      2    78%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/anchor.py            10      4    60%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/comments.py         777    552    29%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/compat.py           154     87    44%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/composer.py         125     23    82%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/constructor.py     1062    767    28%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/cyaml.py             42     25    40%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/dumper.py            29     16    45%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/emitter.py         1136   1046     8%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/error.py            163    103    37%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/events.py            81      4    95%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/loader.py            43     28    35%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/main.py             828    609    26%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/nodes.py             52     18    65%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/parser.py           466    161    65%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/reader.py           180     89    51%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/representer.py      778    634    19%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/resolver.py         220    122    45%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarbool.py        23     15    35%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarfloat.py       69     50    28%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarint.py         67     44    34%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarstring.py      75     42    44%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scanner.py         1327    613    54%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/serializer.py       141    118    16%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/timestamp.py         33     26    21%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/tokens.py           236    117    50%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/util.py             142    121    15%
yaml_fuzzer_pyafl.py                                                    64     26    59%
----------------------------------------------------------------------------------------
TOTAL                                                                 8433   5558    34%

To collect html-coverage, you can use default cov-html subcommand instead of pycov.

Crash Triage

There's also no difference in analyzing crashes between atheris and python-afl pipelines.

As I said before, I'll use casr via sydr-fuzz casr subcommand for crash triage:

# sydr-fuzz -c yaml_fuzzer-atheris.toml casr

You can learn more about casr from it's repository or from my other fuzzing tutorial.

Let's look at casr output:

[2023-01-11 20:47:14] [INFO] Casr-cluster: deduplication of casr reports...
[2023-01-11 20:47:16] [INFO] Reports before deduplication: 407; after: 16
[2023-01-11 20:47:16] [INFO] Casr-cluster: clustering casr reports...
[2023-01-11 20:47:16] [INFO] Reports before clustering: 16. Clusters: 8
[2023-01-11 20:47:16] [INFO] Copying inputs...
[2023-01-11 20:47:16] [INFO] Done!
[2023-01-11 20:47:16] [INFO] ==> <cl1>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl1/crash-bf5829959ccf0211640314bb30de19bc9bafdeb3
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl2>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl2/crash-e126eb63b0bc1aefac72c3f56dea8484577f1007
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: RecursionError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/events.py:78
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> RecursionError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl3>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl3/crash-017ee5d1bb2bee51263f083eb12a60711a3c84f1
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl4>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl4/crash-3637416d80df3c5961e05b0bd459b79009e2a182
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 2
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl5>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl5/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl6>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-0ea90a02b95f99e850b036e49419a43103a54149
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:533
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-988f305721849b6a75af3b3f424b4593901630c3
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl7>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl7/crash-3f369c580ac61eded9d05eb06bc1ad6d0e90bfe1
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl8>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl8/crash-05451dc00f42aa97a064d2e08153bb84af113717
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: TypeError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:273
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> TypeError: 1
[2023-01-11 20:47:16] [INFO] SUMMARY -> RecursionError: 1 KeyError: 11 ValueError: 3 TypeError: 1
[2023-01-11 20:47:16] [INFO] Crashes and Casr reports are saved in /fuzz/yaml_fuzzer-out/casr

After deduplication we have 16 crashes splitted into 8 clusters. Nice, now we can get down to manual analysis. Let's look at some report, for example from cl6: casrep An unhandled exception has occurred while converting string to float. Looks like an issue:).

Virtual environments

I'd also like to highlight the recently added support for virtual environments in Atheris and python-afl — working with them is demonstrated in the msgspec fuzzer.

It is possible to fuzz Python targets using Atheris and python-afl without rebuilding the instrumented libraries with virtual environments. At the time of writing, the build script in msgspec, with slight simplifications, looks like this:

... # do some msgspec scpefic job 

python3 -m venv --system-site-packages /atherisVenv
python3 -m venv --system-site-packages /pyAflVenv

# prepare python-afl venv

source /pyAflVenv/bin/activate

pip install python-afl --ignore-installed
pip install coverage --ignore-installed

... # do some msgspec scpefic job 

cd /msgspec

MSGSPEC_DEBUG=1 CC=afl-clang-fast CFLAGS="-fsanitize=address -Wl,-rpath=/usr/lib/clang/14.0.6/lib/linux/" LDFLAGS="/usr/local/lib/afl/afl-compiler-rt.o /usr/lib/clang/14.0.6/lib/linux/libclang_rt.asan-x86_64.so" LDSHARED="clang -shared" pip3 install --ignore-installed .
rm -rf build

deactivate


# Prepare Atheris venv
source /atherisVenv/bin/activate

pip install atheris --ignore-installed
pip install coverage --ignore-installed

...  # do some msgspec scpefic job 

cd /msgspec
MSGSPEC_DEBUG=1 CC=clang CFLAGS="-g -fsanitize=fuzzer-no-link,address" LDSHARED="clang -shared" pip3 install --ignore-installed .

deactivate

To allow sydr-fuzz launch fuzz targets within virtual environments, the field venv should be specified in configuration files:

venv = "/atherisVenv/"

Conclusion

In conclusion I want to say that Atheris and python-afl are a cool fuzzers for Python code. Sydr-fuzz interface is neat. And of course casr, that can triage crashes for Python, helps a lot!


Andrey Fedotov, Alexey Marenkov

Clone this wiki locally