Skip to content

missing trials when doing local experiment with runners-cpus #2075

Open
@zukatsinadze

Description

@zukatsinadze

Hi @DonggeLiu @jonathanmetzman

Lately, I've been running lots of local experiments on fuzzbench and noticed that after I added --runners-cpus flag reports were sometimes incomplete due to race condition.

This is my config:

# The number of trials of a fuzzer-benchmark pair.
trials: 5

# The amount of time in seconds that each trial is run for.
# 1 day = 24 * 60 * 60 = 86400
max_total_time: 3600

# The location of the docker registry.
# FIXME: Support custom docker registry.
# See https://github.com/google/fuzzbench/issues/777
docker_registry: gcr.io/fuzzbench

# The local experiment folder that will store most of the experiment data.
# Please use an absolute path.
experiment_filestore: /home/zuka/hexhive/data/local-runs/experiment-data

# The local report folder where HTML reports and summary data will be stored.
# Please use an absolute path.
report_filestore: /home/zuka/hexhive/data/local-runs/report-data

# Flag that indicates this is a local experiment.
local_experiment: true

and I use this command to start experiment:

PYTHONPATH=. python3 experiment/run_experiment.py \                                                                                                                                                                
--experiment-config experiment-config.yaml \
--benchmarks curl_curl_fuzzer_http freetype2_ftfuzzer bloaty_fuzz_target jsoncpp_jsoncpp_fuzzer libxml2_xml sqlite3_ossfuzz vorbis_decode_fuzzer \
--experiment-name libafl-1h-with-seeds \
--fuzzers libafl_default libafl_random libafl_weighted libafl_valprof libafl_covaccount \
--concurrent-builds 15 --runners-cpus 15 --measurers-cpus 1

Adding runners-cpus besides restricting number of usable CPUs, also adds pinning to docker command. Most of the times I am getting only first cycle of trials (If I run with --runners-cpus 16, then I get only 16 trials in the report). For other trials there were fuzzer logs, corpus archives, but no coverage archives.

The reason for this is measurer_main_process ends before the next cycle of trials is started. I see Finished measure loop. in the logs after the first cycle and the loop is never restarted.

After some more debugging I found the issue in this piece of code inside measure_manager_loop

        while not scheduler.all_trials_ended(experiment):
            continue_inner_loop = measure_manager_inner_loop(
                experiment, max_cycle, request_queue, response_queue,
                queued_snapshots)
             if not continue_inner_loop:
                break
            time.sleep(MEASUREMENT_LOOP_WAIT)

After the first cycle ends, measure_manager_inner_loop returns False and the loop breaks out, because there are no unmeasured snapshots in the database yet.

I don't really understand the need for this break, so to fix the issue for my runs, I just removed break logic from the measurer loop and just let it run until scheduler.all_trials_ended. If you think this is an acceptable solution I can create PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions