Skip to content

Fix parallel fuzzing premature termination #540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

arington-halabi
Copy link

@arington-halabi arington-halabi commented Apr 3, 2025

This PR addresses issue #525 where parallel fuzzing terminates prematurely when a target fails.

Problem

When fuzzing a target fails (e.g., due to missing corpus entries or crashes), the current implementation causes the entire parallel fuzzing process to stop, preventing other valid targets from being fuzzed.

Solution

As suggested in the issue discussion, I've implemented a solution that:

  1. Continues Past Errors: When a fuzzing target fails, the implementation now:

    • Logs a warning with specific target information
    • Marks the target as failed
    • Continues fuzzing the remaining valid targets
  2. Handles All-Targets-Failed: To address the corner case mentioned:

    • Tracks failed targets in a HashSet
    • Detects when all targets have failed
    • Terminates gracefully with an informative message
    • Prevents spinning without making progress

This implementation focuses specifically on handling failures in the fuzzing targets themselves rather than system-level errors, as recommended in the review.

@arington-halabi
Copy link
Author

Hey, @smoelius. I have fixed the issue the way you asked. Can you please review.

@smoelius
Copy link
Collaborator

smoelius commented Apr 5, 2025

Hey, @smoelius. I have fixed the issue the way you asked. Can you please review.

Sorry, @arington-halabi. I will review this this weekend.

Copy link
Collaborator

@smoelius smoelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to give this another look after you've considered my comments.

But I think this is looking good, and it's clear that you've put a lot of thought into this, which I really appreciate!

One other thing to consider is whether you would like to write a test or tests for this. That could be separate PR, though.

if !status.success() {
eprintln!("Warning: Command failed: {:?}", command);
return Ok(());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code does not involve parallel fuzzing, actually. Hence, this part of the code should not need to be modified.

Comment on lines 1039 to 1075
let popen_result = command.spawn();

// Handle spawn failures gracefully
if let Err(e) = popen_result {
eprintln!("Warning: Could not spawn `{exec:?}`: {}", e);
failed_targets.insert(i_target);
i_task += 1;
continue;
}

let mut popen = popen_result.unwrap();
let stdout_result = popen.stdout.take();

if stdout_result.is_none() {
eprintln!("Warning: Could not get output of `{exec:?}`");
failed_targets.insert(i_target);
i_task += 1;
continue;
}

let stdout = stdout_result.unwrap();
let mut receiver = Receiver::from(stdout);
receiver
.set_nonblocking(true)
.with_context(|| "Could not make receiver non-blocking")?;
poll.registry()
.register(&mut receiver, Token(i_target), Interest::READABLE)
.with_context(|| "Could not register receiver")?;

if let Err(e) = receiver.set_nonblocking(true) {
eprintln!("Warning: Could not make receiver non-blocking: {}", e);
failed_targets.insert(i_target);
i_task += 1;
continue;
}

if let Err(e) = poll.registry().register(&mut receiver, Token(i_target), Interest::READABLE) {
eprintln!("Warning: Could not register receiver: {}", e);
failed_targets.insert(i_target);
i_task += 1;
continue;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the code from the spawn to here can remain unchanged.

If an error were to occur here, it would mean that something was wrong with cargo-test-fuzz or the system on which it was running, not the fuzzing target. For errors of this kind, I think it is okay to bail out (at least for now).

The same comment applies to several more places below, which I will mark with just '^'.

The cases where I think an error could indicate a problem in the fuzzing target are:

Hence, I have not marked those with '^' below.

Comment on lines 1090 to 1107
if n_children == 0 {
assert!(config.sufficient_cpus);
assert!(i_task >= executable_targets.len());
break;
// If we have no children and all targets have failed, terminate
if failed_targets.len() == executable_targets.len() {
eprintln!("All targets failed to start. Terminating fuzzing process.");
break;
}

// If we have no children because we've cycled through all targets,
// but not all have failed, this is normal completion
if config.sufficient_cpus && i_task >= executable_targets.len() {
break;
}

// If we have no children but still have targets that haven't failed,
// something went wrong. Wait a bit and try again.
std::thread::sleep(std::time::Duration::from_millis(100));
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this n_children == 0 should not need to be modified.

This case is for when there are sufficiently many cpus to fuzz all targets simultaneously (hence, the assertion).

In particular, if all children have exited (e.g., because --run-until-crash was passed and they all found crashes), then there is no more work to do, and we can simply break out of the loop.

I appreciate the way you commented your changes here, though.

Comment on lines 1109 to 1113
let poll_result = poll.poll(&mut events, None);
if let Err(e) = poll_result {
eprintln!("Warning: Poll failed: {}", e);
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Comment on lines 1123 to 1129
let s = match child.read_lines() {
Ok(s) => s,
Err(e) => {
eprintln!("Warning: Could not read lines from child process: {}", e);
continue;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Comment on lines 1151 to 1153
if let Err(e) = poll.registry().deregister(&mut child.receiver) {
eprintln!("Warning: Could not deregister receiver: {}", e);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Comment on lines 1157 to 1165
let status_result = child.popen.wait();

if let Err(e) = status_result {
eprintln!("Warning: `wait` failed for `{:?}`: {}", child.popen, e);
failed_targets.insert(i_target);
continue;
}

let status = status_result.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@arington-halabi
Copy link
Author

Hey, @smoelius . I tried to implement all the changes you suggested. Hope this works fine. Please, have a look.

@arington-halabi
Copy link
Author

@smoelius I'll try to get the ci to pass, sorry for the inconvenience this is causing.

@CLAassistant
Copy link

CLAassistant commented Apr 9, 2025

CLA assistant check
All committers have signed the CLA.

@arington-halabi
Copy link
Author

@smoelius I've made the changes as you requested. The CI is failing because of the test cases, which are currently not added. I'll work on that once this one is done.

Can you please review the new changes?

@smoelius
Copy link
Collaborator

What you have now looks reasonable to me.

To be honest, I would not expect your changes to affect the fuzz_generic::fuzz_foo_qwerty test.

But you know why it is failing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants