Fix parallel fuzzing premature termination #540

arington-halabi · 2025-04-03T01:20:56Z

This PR addresses issue #525 where parallel fuzzing terminates prematurely when a target fails.

Problem

When fuzzing a target fails (e.g., due to missing corpus entries or crashes), the current implementation causes the entire parallel fuzzing process to stop, preventing other valid targets from being fuzzed.

Solution

As suggested in the issue discussion, I've implemented a solution that:

Continues Past Errors: When a fuzzing target fails, the implementation now:
- Logs a warning with specific target information
- Marks the target as failed
- Continues fuzzing the remaining valid targets
Handles All-Targets-Failed: To address the corner case mentioned:
- Tracks failed targets in a HashSet
- Detects when all targets have failed
- Terminates gracefully with an informative message
- Prevents spinning without making progress

This implementation focuses specifically on handling failures in the fuzzing targets themselves rather than system-level errors, as recommended in the review.

arington-halabi · 2025-04-05T05:58:27Z

Hey, @smoelius. I have fixed the issue the way you asked. Can you please review.

smoelius · 2025-04-05T11:48:53Z

Hey, @smoelius. I have fixed the issue the way you asked. Can you please review.

Sorry, @arington-halabi. I will review this this weekend.

smoelius

I'd like to give this another look after you've considered my comments.

But I think this is looking good, and it's clear that you've put a lot of thought into this, which I really appreciate!

One other thing to consider is whether you would like to write a test or tests for this. That could be separate PR, though.

smoelius · 2025-04-06T01:26:07Z

cargo-test-fuzz/src/lib.rs

+        if !status.success() {
+            eprintln!("Warning: Command failed: {:?}", command);
+            return Ok(());
+        }


This part of the code does not involve parallel fuzzing, actually. Hence, this part of the code should not need to be modified.

smoelius · 2025-04-06T10:11:51Z

cargo-test-fuzz/src/lib.rs

+            let popen_result = command.spawn();
+
+            // Handle spawn failures gracefully
+            if let Err(e) = popen_result {
+                eprintln!("Warning: Could not spawn `{exec:?}`: {}", e);
+                failed_targets.insert(i_target);
+                i_task += 1;
+                continue;
+            }
+
+            let mut popen = popen_result.unwrap();
+            let stdout_result = popen.stdout.take();
+
+            if stdout_result.is_none() {
+                eprintln!("Warning: Could not get output of `{exec:?}`");
+                failed_targets.insert(i_target);
+                i_task += 1;
+                continue;
+            }
+
+            let stdout = stdout_result.unwrap();
            let mut receiver = Receiver::from(stdout);
-            receiver
-                .set_nonblocking(true)
-                .with_context(|| "Could not make receiver non-blocking")?;
-            poll.registry()
-                .register(&mut receiver, Token(i_target), Interest::READABLE)
-                .with_context(|| "Could not register receiver")?;
+
+            if let Err(e) = receiver.set_nonblocking(true) {
+                eprintln!("Warning: Could not make receiver non-blocking: {}", e);
+                failed_targets.insert(i_target);
+                i_task += 1;
+                continue;
+            }
+
+            if let Err(e) = poll.registry().register(&mut receiver, Token(i_target), Interest::READABLE) {
+                eprintln!("Warning: Could not register receiver: {}", e);
+                failed_targets.insert(i_target);
+                i_task += 1;
+                continue;
+            }
+


I think all of the code from the spawn to here can remain unchanged.

If an error were to occur here, it would mean that something was wrong with cargo-test-fuzz or the system on which it was running, not the fuzzing target. For errors of this kind, I think it is okay to bail out (at least for now).

The same comment applies to several more places below, which I will mark with just '^'.

The cases where I think an error could indicate a problem in the fuzzing target are:

test-fuzz/cargo-test-fuzz/src/lib.rs

Line 1096 in dd090be

if !status.success() {

test-fuzz/cargo-test-fuzz/src/lib.rs

Line 1100 in dd090be

if !child.testing_aborted_programmatically {

Hence, I have not marked those with '^' below.

smoelius · 2025-04-06T10:43:04Z

cargo-test-fuzz/src/lib.rs

        if n_children == 0 {
-            assert!(config.sufficient_cpus);
-            assert!(i_task >= executable_targets.len());
-            break;
+            // If we have no children and all targets have failed, terminate
+            if failed_targets.len() == executable_targets.len() {
+                eprintln!("All targets failed to start. Terminating fuzzing process.");
+                break;
+            }
+
+            // If we have no children because we've cycled through all targets,
+            // but not all have failed, this is normal completion
+            if config.sufficient_cpus && i_task >= executable_targets.len() {
+                break;
+            }
+
+            // If we have no children but still have targets that haven't failed,
+            // something went wrong. Wait a bit and try again.
+            std::thread::sleep(std::time::Duration::from_millis(100));
+            continue;
        }


I think this n_children == 0 should not need to be modified.

This case is for when there are sufficiently many cpus to fuzz all targets simultaneously (hence, the assertion).

In particular, if all children have exited (e.g., because --run-until-crash was passed and they all found crashes), then there is no more work to do, and we can simply break out of the loop.

I appreciate the way you commented your changes here, though.

smoelius · 2025-04-06T10:45:31Z

cargo-test-fuzz/src/lib.rs

+        let poll_result = poll.poll(&mut events, None);
+        if let Err(e) = poll_result {
+            eprintln!("Warning: Poll failed: {}", e);
+            continue;
+        }


smoelius · 2025-04-06T10:46:33Z

cargo-test-fuzz/src/lib.rs

+            let s = match child.read_lines() {
+                Ok(s) => s,
+                Err(e) => {
+                    eprintln!("Warning: Could not read lines from child process: {}", e);
+                    continue;
+                }
+            };


smoelius · 2025-04-06T10:50:48Z

cargo-test-fuzz/src/lib.rs

+                if let Err(e) = poll.registry().deregister(&mut child.receiver) {
+                    eprintln!("Warning: Could not deregister receiver: {}", e);
+                }


smoelius · 2025-04-06T10:51:26Z

cargo-test-fuzz/src/lib.rs

+                let status_result = child.popen.wait();
+
+                if let Err(e) = status_result {
+                    eprintln!("Warning: `wait` failed for `{:?}`: {}", child.popen, e);
+                    failed_targets.insert(i_target);
+                    continue;
+                }
+
+                let status = status_result.unwrap();


arington-halabi · 2025-04-07T15:22:07Z

Hey, @smoelius . I tried to implement all the changes you suggested. Hope this works fine. Please, have a look.

arington-halabi · 2025-04-07T18:29:03Z

@smoelius I'll try to get the ci to pass, sorry for the inconvenience this is causing.

CLAassistant · 2025-04-09T14:43:42Z

All committers have signed the CLA.

arington-halabi · 2025-04-09T15:13:19Z

@smoelius I've made the changes as you requested. The CI is failing because of the test cases, which are currently not added. I'll work on that once this one is done.

Can you please review the new changes?

smoelius · 2025-04-10T00:09:50Z

What you have now looks reasonable to me.

To be honest, I would not expect your changes to affect the fuzz_generic::fuzz_foo_qwerty test.

But you know why it is failing?

Fix parallel fuzzing premature termination

ef051c3

arington-halabi requested a review from smoelius as a code owner April 3, 2025 01:20

Update implementation to continue past errors

a48a22e

arington-halabi mentioned this pull request Apr 3, 2025

Premature termination of parallel fuzzing #525

Open

smoelius reviewed Apr 6, 2025

View reviewed changes

Implement graceful error handling for failed fuzzing targets

8aee95f

Refactor fuzzing loop for improved readability and maintainability

e8404b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix parallel fuzzing premature termination #540

Fix parallel fuzzing premature termination #540

Uh oh!

arington-halabi commented Apr 3, 2025 •

edited

Loading

Uh oh!

arington-halabi commented Apr 5, 2025

Uh oh!

smoelius commented Apr 5, 2025

Uh oh!

smoelius left a comment

Uh oh!

smoelius Apr 6, 2025

Uh oh!

smoelius Apr 6, 2025

Uh oh!

smoelius Apr 6, 2025

Uh oh!

smoelius Apr 6, 2025

Uh oh!

smoelius Apr 6, 2025

Uh oh!

smoelius Apr 6, 2025

Uh oh!

smoelius Apr 6, 2025

Uh oh!

arington-halabi commented Apr 7, 2025

Uh oh!

arington-halabi commented Apr 7, 2025

Uh oh!

CLAassistant commented Apr 9, 2025 •

edited

Loading

Uh oh!

arington-halabi commented Apr 9, 2025

Uh oh!

smoelius commented Apr 10, 2025

Uh oh!

Uh oh!

Fix parallel fuzzing premature termination #540

Are you sure you want to change the base?

Fix parallel fuzzing premature termination #540

Uh oh!

Conversation

arington-halabi commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

arington-halabi commented Apr 5, 2025

Uh oh!

smoelius commented Apr 5, 2025

Uh oh!

smoelius left a comment

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

smoelius Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

arington-halabi commented Apr 7, 2025

Uh oh!

arington-halabi commented Apr 7, 2025

Uh oh!

CLAassistant commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arington-halabi commented Apr 9, 2025

Uh oh!

smoelius commented Apr 10, 2025

Uh oh!

Uh oh!

arington-halabi commented Apr 3, 2025 •

edited

Loading

CLAassistant commented Apr 9, 2025 •

edited

Loading