-
Notifications
You must be signed in to change notification settings - Fork 36
Fuzzing ruamel yaml (Python) project with sydr fuzz
In this article I'll share my experience of fuzzing Python projects. For this purpose I use sydr-fuzz with Atheris and python-afl backends. Sydr-fuzz was originally designed as a hybrid fuzzer that combines Sydr (DSE tool) and top world fuzzers like AFLplusplus and libFuzzer. Also, sydr-fuzz supports some useful features like crash triage by casr, ability to check security predicates, and some convenient subcommands for corpus minimization and code coverage collection. Atheris and python-afl is a coverage-guided Python fuzzing engines. They support fuzzing of Python code and also native extensions written for CPython. Atheris is based on libFuzzer, while python-afl - on AFL++. They look and work like libFuzzer and AFL++, so we decided to support it in sydr-fuzz, why not? Though we don't have symbolic execution for Python code but we still could do fuzzing, crash triage, corpus minimization, and coverage collection using sydr-fuzz interface.
Atheris github page has a nice instruction about installing and using it. We will fuzz yaml project from it's examples. There is a docker container already prepared for building with all needed fuzzing environment. I'll use it for my fuzzing experiments, but for now let's look more precisely at fuzz target and build script.
import atheris
with atheris.instrument_imports():
from ruamel import yaml as ruamel_yaml
import sys
import warnings
# Suppress all warnings.
warnings.simplefilter("ignore")
ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
ryaml.allow_duplicate_keys = True
@atheris.instrument_func
def TestOneInput(input_bytes):
fdp = atheris.FuzzedDataProvider(input_bytes)
data = fdp.ConsumeUnicode(sys.maxsize)
try:
iterator = ryaml.load_all(data)
for _ in iterator:
pass
except ruamel_yaml.error.YAMLError:
return
except Exception:
input_type = str(type(data))
codepoints = [hex(ord(x)) for x in data]
sys.stderr.write(
"Input was {input_type}: {data}\nCodepoints: {codepoints}".format(
input_type=input_type, data=data, codepoints=codepoints))
raise
def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
if __name__ == "__main__":
main()For Atheris, we define which modules we want to instrument. There is also an ability to instrument everything: atheris.instrument_all(). It might be useful when your project has many dependencies. Then we need to implement def TestOneInput(input_bytes), this is similar to int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) for C/C++. It is important to catch exceptions that are thrown by the target function. But you need to catch only those exceptions that are specified by developers or that the function throws directly. For example, IndexError doesn't need to be caught if it is not specified in documentation. Both Atheris and python-afl catches it as a crash.
For python-afl fuzzing target looks a little bit different. You can read more about it here and here.
#!/usr/bin/python3
import afl, sys, os
from ruamel import yaml as ruamel_yaml
import warnings
# Suppress all warnings.
warnings.simplefilter("ignore")
def _ConsumeString(res_len, data):
...
def TestOneInput(input_bytes):
ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
ryaml.allow_duplicate_keys = True
data = _ConsumeString(sys.maxsize, input_bytes)
try:
iterator = ryaml.load_all(data)
for _ in iterator:
pass
except ruamel_yaml.error.YAMLError:
return
except Exception:
input_type = str(type(data))
codepoints = [hex(ord(x)) for x in data]
sys.stderr.write(
"Input was {input_type}: {data}\nCodepoints: {codepoints}".format(
input_type=input_type, data=data, codepoints=codepoints))
raise
def main():
# выбираем откуда считывать stdin
try:
# Python 3:
stdin_compat = sys.stdin.buffer
except AttributeError:
# There is no buffer attribute in Python 2:
stdin_compat = sys.stdin
while afl.loop(10000):
TestOneInput(stdin_compat.read())
sys.stdin.seek(0) # очистка stdin между запусками
os._exit(0)
if __name__ == "__main__":
main()One of the main differences is that python-afl lacks of FuzzedDataProvider class for converting input bytes into a certain type expected by fuzzing target. Also, you might wonder why the shebang (#!/usr/bin/python3) was changed - this is not a coincidence, python-afl with ASAN is likely to encounter many errors like [CMIN] du: cannot access ... or Fork server handshake failed ... when using shebang with env. The fix (assuming everything else is configured correctly) is to use the shebang without env - so be careful when using python-afl.
Similarly, a nasty issue with artifacts like hanging cmin/coverage can happen on some systems when using LD_PRELOAD - there is a workaround, but the easy way is to switch to python-afl.
At last, we need to write some code to start fuzzing process.
For Atheris:
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()For python-afl with persistence mode:
while afl.loop(10000):
TestOneInput(stdin_compat.read())
sys.stdin.seek(0) # очистка stdin между запусками
os._exit(0)For python-afl without persistence mode:
afl.init()
TestOneInput(stdin_compat.read())
os._exit(0)Using of persistence mode is highly recommended as it works several times faster than default fuzzing mode, but the presence of global/static variables can affect the fuzzing process. It is also important to mention, that persistence mode fails to analyze targets with afl-instrumented code (using afl-clang for C-code dependencies, for example). In these cases the use of default fuzzing mode is justified. Or, if you want, you can use a trick and import afl-instrumented libraries inside TestOneInput(). This way we could use persistence mode by the cost of reduced stability (and potentially incorrect fuzzing sometimes), but it is still much faster the default fuzzing mode. We don't have any C-code dependencies inside ruamel project, but just to demonstrate how the trick works:
def TestOneInput(input_bytes):
# тот самый импорт, о котором шла речь
from ruamel import yaml as ruamel_yaml
ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
ryaml.allow_duplicate_keys = True
# ... идентичный код
while afl.loop(10000):
TestOneInput(stdin_compat.read())
sys.stdin.seek(0)
os._exit(0)As for build, we could just use pip install . from project directory and install instrumented project in our fuzzing environment. Ok, let's build docker container and start fuzzing!
Building libraries containing C/C++ for python-afl may present some challenges. Perhaps the build examples from our supported targets will help. At the time of writing, these are msgspec and ultrajson.
Before we begin, let's look at yaml_fuzzer-atheris.toml:
exit-on-time = 3600
[atheris]
path = "/yaml_fuzzer-atheris.py"
args = "/corpus -dict=yaml.dict -jobs=1000 -workers=4"And also at yaml_fuzzer-pyafl.toml:
exit-on-time = 3600
[pyafl]
args = "-i /corpus"
target = "/fuzz/yaml_fuzzer_pyafl.py"
jobs = 4It's pretty simple.
exit-on-time - is an optional parameter that takes time in seconds. If during this time (1 hour in our case) the coverage does not increase, fuzzing is automatically terminated.
I'll use 4 workers for fuzzing till 1000 crashes are found or exit-on-time is triggered. Let's start fuzzing with this command:
# sydr-fuzz -c yaml_fuzzer-atheris.toml run
[2023-01-11 17:22:47] [INFO] #3582 RELOAD cov: 1178 ft: 5252 corp: 478/64Kb lim: 487 exec/s: 275 rss: 713Mb
[2023-01-11 17:22:48] [INFO] Uncaught Python exception: KeyError: (0, 1) /fuzz/yaml_fuzzer-out/crashes/crash-a0acd109aef7675ce2268eec4e0901759f4e1edc
[2023-01-11 17:22:50] [INFO] #17540 REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 343 rss: 677Mb L: 13/481 MS: 2 CrossOver-EraseBytes-
[2023-01-11 17:22:50] [INFO] #17573 REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 344 rss: 677Mb L: 58/481 MS: 3 ChangeBit-ManualDict-EraseBytes- DE: "'"-
[2023-01-11 17:22:50] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-4230d57dcf9dce49804ffd9abbc43a751068c6a2
[2023-01-11 17:22:55] [INFO] #1024 pulse cov: 1171 ft: 4926 corp: 413/36Kb exec/s: 204 rss: 710Mb
[2023-01-11 17:22:55] [INFO] [ATHERIS] run time : 0 days, 0 hrs, 0 min, 57 sec
[2023-01-11 17:22:55] [INFO] [ATHERIS] last new find : 0 days, 0 hrs, 0 min, 8 sec
[2023-01-11 17:22:57] [INFO] #1268 INITED cov: 1178 ft: 5265 corp: 477/60Kb exec/s: 181 rss: 710Mb
After some amount of time we have found some crashes. Let's wait till fuzzing is finished.
[2023-01-11 19:21:27] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-1f71bdb8cbba856923a45f50c7873bdb7ef64e2d
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-c3152f019e73c4c01925ae1533f47583fe3006df
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-fed9747bec6197c9c8cbc0cf10c051c8f807d407
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 0) /fuzz/yaml_fuzzer-out/crashes/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 19:21:41] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing corpus is saved in /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 19:21:42] [INFO] [RESULTS] oom/leak/timeout/crash: 0/0/0/407
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing results are saved in /fuzz/yaml_fuzzer-out/crashes
Nice, our fuzzing experiment is ended by exit-on-time. We've got 407 crashes to analyze! This is a job for casr.
Let's minimize corpus first:
# sydr-fuzz -c yaml_fuzzer-atheris.toml cmin
[2023-01-11 20:30:08] [INFO] Original fuzzing corpus saved as /fuzz/yaml_fuzzer-out/corpus-old
[2023-01-11 20:30:08] [INFO] Minimizing corpus /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:30:08] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:30:08] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:30:08] [INFO] Launching atheris: "/yaml_fuzzer.py" "-merge=1" "-artifact_prefix=/fuzz/yaml_fuzzer-out/crashes/" "-close_fd_mask=2" "-verbosity=2" "-detect_leaks=0" "-dict=/fuzz/yaml.dict" "/fuzz/yaml_fuzzer-out/corpus" "/fuzz/yaml_fuzzer-out/corpus-old"
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: 8719 files, 0 in the initial corpus, 0 processed earlier
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: attempt 1
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: successful in 1 attempt(s)
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: the control file has 982127 bytes
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: consumed 0Mb (120Mb rss) to parse the control file
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: 913 new files with 7301 new features added; 1249 new coverage edges
We've narrowed 8719 files to 913 files, nice!
Same step for fuzzing with python-afl:
# sydr-fuzz -c yaml_fuzzer-pyafl.toml run
[2025-08-15 14:05:47] [INFO] [AFL++] [*] Attempting dry run with 'id:000009,time:0,execs:0,orig:yaml-version.yaml'...
[2025-08-15 14:05:47] [INFO] [AFL++] len = 24, map size = 1570, exec speed = 5423 us, hash = 26807c104c8188ff
[2025-08-15 14:05:47] [INFO] [AFL++] [!] WARNING: Instrumentation output varies across runs.
[2025-08-15 14:05:47] [INFO] [AFL++] [+] All test cases processed.
[2025-08-15 14:05:47] [INFO] [AFL++] [!] WARNING: The target binary is pretty slow! See /usr/local/share/doc/afl/fuzzing_in_depth.md#i-improve-the-speed
[2025-08-15 14:05:47] [INFO] [AFL++] [+] Here are some useful stats:
[2025-08-15 14:05:47] [INFO] [AFL++] Test case count : 8 favored, 4 variable, 0 ignored, 10 total
[2025-08-15 14:05:47] [INFO] [AFL++] Bitmap range : 1160 to 2057 bits (average: 1525.50 bits)
[2025-08-15 14:05:47] [INFO] [AFL++] Exec timing : 2685 to 48.2k us (average: 12.7k us)
[2025-08-15 14:05:47] [INFO] [AFL++]
[2025-08-15 14:05:47] [INFO] [AFL++] [*] -t option specified. We'll use an exec timeout of 2000 ms.
[2025-08-15 14:05:47] [INFO] [AFL++] [+] All set and ready to roll!
[2025-08-15 14:05:47] [INFO] [AFL++] [*] Entering queue cycle 1
[2025-08-15 14:05:47] [INFO] [AFL++]
[2025-08-15 14:05:47] [INFO] [AFL++] [*] Fuzzing test case #1 (10 total, 0 crashes saved, state: started :-), mode=explore, perf_score=100, weight=1, favorite=1, was_fuzzed=0, exec_us=2685, hits=0, map=1360, ascii=0, run_time=0:00:00:00)...
[2025-08-15 14:06:16] [INFO] Found crash /fuzz/yaml_fuzzer_pyafl-out/crashes/crash-bb4d05f035d758b66070c9250124988e81a168d7
After some time we found several crashes. Let's wait till fuzzing is finished.
[2025-08-17 09:45:51] [INFO] [AFL++] +++ Testing aborted programmatically +++
[2025-08-17 09:45:51] [INFO] [AFL++] [!]
[2025-08-17 09:45:51] [INFO] [AFL++] Performing final sync, this make take some time ...
[2025-08-17 09:45:51] [INFO] [AFL++] [!] Done!
[2025-08-17 09:45:51] [INFO] [AFL++] [*] Writing /fuzz/yaml_fuzzer_pyafl-out/aflplusplus/afl_main-worker/fastresume.bin ...
[2025-08-17 09:45:51] [INFO] [AFL++] [+] Written fastresume.bin with 2431192 bytes!
[2025-08-17 09:45:51] [INFO] [AFL++] [+] We're done here. Have a nice day!
[2025-08-17 09:45:51] [INFO] [AFL++]
[2025-08-17 09:45:51] [INFO] [RESULTS] Fuzzing corpuses are saved in workers queue directories. Run sydr-fuzz cmin subcommand to gather full corpus at "/fuzz/yaml_fuzzer_pyafl-out/corpus-old" and minimized corpus at "/fuzz/yaml_fuzzer_pyafl-out/corpus".
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_main] 1675 new corpus items found, 9.46% coverage achieved, 197 crashes saved, 131 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 53 sec
[2025-08-17 09:45:51] [INFO] [RESULTS] execs done: 27544322, execs/s: 193.14, edges found: 6202/65536, stability: 66.01%
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_s01] 1422 new corpus items found, 9.35% coverage achieved, 168 crashes saved, 128 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 57 sec
[2025-08-17 09:45:51] [INFO] [RESULTS] execs done: 15183330, execs/s: 106.46, edges found: 6126/65536, stability: 67.37%
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_s02] 1470 new corpus items found, 9.50% coverage achieved, 218 crashes saved, 111 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 58 sec
[2025-08-17 09:45:51] [INFO] [RESULTS] execs done: 16792546, execs/s: 117.74, edges found: 6226/65536, stability: 83.99%
[2025-08-17 09:45:51] [INFO] [RESULTS] [afl_s03] 1359 new corpus items found, 9.41% coverage achieved, 203 crashes saved, 118 timeouts saved, total runtime 1 days, 15 hrs, 36 min, 55 sec
[2025-08-17 09:45:51] [INFO] [RESULTS] execs done: 16465514, execs/s: 115.45, edges found: 6168/65536, stability: 66.97%
[2025-08-17 09:45:51] [INFO] [RESULTS] timeout/crash: 488/785
Nice, we've found 785 crashes! Corpus minimization is mandatory for python-afl because AFL++ have its fuzzing corpus scattered between workers. Sydr-fuzz cmin command allows to gather them all in one directory and minimize it:
# sydr-fuzz -c yaml_fuzzer-pyafl.toml cmin
[2025-08-18 11:42:24] [INFO] [CMIN] [+] Found 28239 unique tuples across 10970 files.
[2025-08-18 11:42:24] [INFO] [CMIN] [+] Narrowed down to 734 files, saved in '/fuzz/yaml_fuzzer_pyafl-out/corpus'.
We've narrowed 10970 files to 734 files, nice! Now we can collect the code coverage!
The coverage is collected the same way for both fuzzers. For code coverage we use well-known coverage python module and this instruction from Atheris GitHub. Of course, we've wrapped it into sydr-fuzz pycov subcommand. Let's get html coverage report:
# sydr-fuzz -c yaml_fuzzer-atheris.toml pycov html
[2023-01-11 20:37:47] [INFO] Running pycov html "/fuzz/yaml_fuzzer.toml"
[2023-01-11 20:37:47] [INFO] Collecting coverage data for each file in corpus: /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:37:47] [INFO] Saving coverage data to /fuzz/yaml_fuzzer-out/coverage/html/.coverage
[2023-01-11 20:37:47] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:37:47] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:37:47] [INFO] Collecting coverage: "coverage" "run" "/yaml_fuzzer.py" "-atheris_runs=914"
[2023-01-11 20:37:51] [INFO] Running coverage html: "coverage" "html" "-d" "/fuzz/yaml_fuzzer-out/coverage/html" "--data-file=/fuzz/yaml_fuzzer-out/coverage/html/.coverage"
Wrote HTML report to /fuzz/yaml_fuzzer-out/coverage/html/index.html
Good, we've got the coverage, let's look at it and move on further!

Exactly the same way we collect coverage for python-afl (now in console-output, for example):
# sydr-fuzz -c yaml_fuzzer-pyafl.toml pycov report
[2025-09-29 16:04:48] [INFO] Configured to run with AFL_PRELOAD SET
[2025-09-29 16:04:48] [INFO] Running pycov report "/fuzz/yaml_fuzzer_pyafl.toml"
[2025-09-29 16:04:48] [INFO] Collecting coverage data for each file in corpus: /fuzz/yaml_fuzzer_pyafl-out/corpus
[2025-09-29 16:04:48] [INFO] Saving coverage data to /fuzz/yaml_fuzzer_pyafl-out/coverage/report/.coverage
[2025-09-29 16:04:48] [INFO] pycov environment: AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 AFL_SYNC_TIME=1 ASAN_OPTIONS=abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=2048,malloc_context_size=0,symbolize=0,verify_asan_link_order=0 PYTHON_AFL_PERSISTENT=1 UBSAN_OPTIONS=abort_on_error=0,allocator_may_return_null=1,halt_on_error=0,malloc_context_size=0
[2025-09-29 16:04:48] [INFO] Collecting coverage: cd "/fuzz/yaml_fuzzer_pyafl-out/coverage/report" && AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES="1" AFL_SYNC_TIME="1" ASAN_OPTIONS="abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=2048,malloc_context_size=0,symbolize=0,verify_asan_link_order=0" PYTHON_AFL_PERSISTENT="1" UBSAN_OPTIONS="abort_on_error=0,allocator_may_return_null=1,halt_on_error=0,malloc_context_size=0" "/bin/bash" "-c" "coverage run --include=*/site-packages/*,*/dist-packages/*,/fuzz/yaml_fuzzer_pyafl.py --omit=*/coverage/* /fuzz/yaml_fuzzer_pyafl-out/coverage/pyAflCovWrapper.py /fuzz/yaml_fuzzer_pyafl-out/corpus"
[2025-09-29 16:04:49] [INFO] Running coverage report: AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES="1" AFL_SYNC_TIME="1" ASAN_OPTIONS="abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=2048,malloc_context_size=0,symbolize=0,verify_asan_link_order=0" PYTHON_AFL_PERSISTENT="1" UBSAN_OPTIONS="abort_on_error=0,allocator_may_return_null=1,halt_on_error=0,malloc_context_size=0" "coverage" "report" "--data-file=/fuzz/yaml_fuzzer_pyafl-out/coverage/report/.coverage" "--ignore-errors"
Name Stmts Miss Cover
----------------------------------------------------------------------------------------
/usr/local/lib/python3.9/dist-packages/_distutils_hack/__init__.py 101 96 5%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/__init__.py 9 2 78%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/anchor.py 10 4 60%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/comments.py 777 552 29%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/compat.py 154 87 44%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/composer.py 125 23 82%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/constructor.py 1062 767 28%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/cyaml.py 42 25 40%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/dumper.py 29 16 45%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/emitter.py 1136 1046 8%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/error.py 163 103 37%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/events.py 81 4 95%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/loader.py 43 28 35%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/main.py 828 609 26%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/nodes.py 52 18 65%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/parser.py 466 161 65%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/reader.py 180 89 51%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/representer.py 778 634 19%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/resolver.py 220 122 45%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarbool.py 23 15 35%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarfloat.py 69 50 28%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarint.py 67 44 34%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scalarstring.py 75 42 44%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/scanner.py 1327 613 54%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/serializer.py 141 118 16%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/timestamp.py 33 26 21%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/tokens.py 236 117 50%
/usr/local/lib/python3.9/dist-packages/ruamel/yaml/util.py 142 121 15%
yaml_fuzzer_pyafl.py 64 26 59%
----------------------------------------------------------------------------------------
TOTAL 8433 5558 34%
To collect html-coverage, you can use default cov-html subcommand instead of pycov.
There's also no difference in analyzing crashes between atheris and python-afl pipelines.
As I said before, I'll use casr via sydr-fuzz casr subcommand for crash triage:
# sydr-fuzz -c yaml_fuzzer-atheris.toml casr
You can learn more about casr from it's repository or from my other fuzzing tutorial.
Let's look at casr output:
[2023-01-11 20:47:14] [INFO] Casr-cluster: deduplication of casr reports...
[2023-01-11 20:47:16] [INFO] Reports before deduplication: 407; after: 16
[2023-01-11 20:47:16] [INFO] Casr-cluster: clustering casr reports...
[2023-01-11 20:47:16] [INFO] Reports before clustering: 16. Clusters: 8
[2023-01-11 20:47:16] [INFO] Copying inputs...
[2023-01-11 20:47:16] [INFO] Done!
[2023-01-11 20:47:16] [INFO] ==> <cl1>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl1/crash-bf5829959ccf0211640314bb30de19bc9bafdeb3
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl2>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl2/crash-e126eb63b0bc1aefac72c3f56dea8484577f1007
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: RecursionError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/events.py:78
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> RecursionError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl3>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl3/crash-017ee5d1bb2bee51263f083eb12a60711a3c84f1
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl4>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl4/crash-3637416d80df3c5961e05b0bd459b79009e2a182
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 2
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl5>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl5/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl6>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-0ea90a02b95f99e850b036e49419a43103a54149
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:533
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-988f305721849b6a75af3b3f424b4593901630c3
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl7>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl7/crash-3f369c580ac61eded9d05eb06bc1ad6d0e90bfe1
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl8>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl8/crash-05451dc00f42aa97a064d2e08153bb84af113717
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: TypeError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:273
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> TypeError: 1
[2023-01-11 20:47:16] [INFO] SUMMARY -> RecursionError: 1 KeyError: 11 ValueError: 3 TypeError: 1
[2023-01-11 20:47:16] [INFO] Crashes and Casr reports are saved in /fuzz/yaml_fuzzer-out/casr
After deduplication we have 16 crashes splitted into 8 clusters. Nice, now we can get down to manual analysis. Let's look at some report, for example from cl6:
An unhandled exception has occurred while converting string to float. Looks like an issue:).
I'd also like to highlight the recently added support for virtual environments in Atheris and python-afl — working with them is demonstrated in the msgspec fuzzer.
It is possible to fuzz Python targets using Atheris and python-afl without rebuilding the instrumented libraries with virtual environments. At the time of writing, the build script in msgspec, with slight simplifications, looks like this:
... # do some msgspec scpefic job
python3 -m venv --system-site-packages /atherisVenv
python3 -m venv --system-site-packages /pyAflVenv
# prepare python-afl venv
source /pyAflVenv/bin/activate
pip install python-afl --ignore-installed
pip install coverage --ignore-installed
... # do some msgspec scpefic job
cd /msgspec
MSGSPEC_DEBUG=1 CC=afl-clang-fast CFLAGS="-fsanitize=address -Wl,-rpath=/usr/lib/clang/14.0.6/lib/linux/" LDFLAGS="/usr/local/lib/afl/afl-compiler-rt.o /usr/lib/clang/14.0.6/lib/linux/libclang_rt.asan-x86_64.so" LDSHARED="clang -shared" pip3 install --ignore-installed .
rm -rf build
deactivate
# Prepare Atheris venv
source /atherisVenv/bin/activate
pip install atheris --ignore-installed
pip install coverage --ignore-installed
... # do some msgspec scpefic job
cd /msgspec
MSGSPEC_DEBUG=1 CC=clang CFLAGS="-g -fsanitize=fuzzer-no-link,address" LDSHARED="clang -shared" pip3 install --ignore-installed .
deactivate
To allow sydr-fuzz launch fuzz targets within virtual environments, the field venv should be specified in configuration files:
venv = "/atherisVenv/"In conclusion I want to say that Atheris and python-afl are a cool fuzzers for Python code. Sydr-fuzz interface is neat. And of course casr, that can triage crashes for Python, helps a lot!
Andrey Fedotov, Alexey Marenkov