executor: improve startup time on machines with many CPUs #6482

FlorentRevest · 2025-11-26T14:43:25Z

I observed that on machines with many CPUs (480 on my setup), fuzzing with a handful of procs (8 on my setup) would consistently fail to start because syz-executors would fail to respond within the default handshake timeout of 1 minute. Reducing procs to 4 would fix it but sounds ridiculous on such a powerful machine.

As part of the default sandbox policy, a syz-executor creates a large number of virtual network interfaces (16 on my kernel config, probably more on other kernels). This step vastly dominates the executor startup time and was clearly responsible for the timeout I observed that prevented me from fuzzing.

When fuzzing or reproducing with procs > 1, all executors run their sandbox setup in parallel. Creating network interfaces is done by socket operations to the RTNL (routing netlink) subsystem. Unfortunately, all RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock so instead of paralellizing the 8*16 interfaces creation, they effectively get serialized and the timing it takes to set up the default sandbox for one executor scales lineraly with the number of executors started "in parallel". This is currently inherent to the rtnl_mutex in the kernel and as far as I can tell there's nothing we can do about it.

However, it makes it very important that each critical section guarded by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock. Unfortunately, the default behavior of a virtual network interface creation is to create one RX and one TX queue per CPU. Each queue is associated with a sysfs file whose creation is quite slow and goes through various sanitized paths that take a long time. This means that each critical section scales linearly to the number of CPUs on the host.

For example, in my setup, starting fuzzing needs 2 minutes 25. I found that I could bring this down to 10 seconds (15x faster startup time!) by limiting the number of RX and TX queues created per virtual interface to 2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically chose 2 to try and keep coverage of the code that exercises multiple queues but I don't have evidences that choosing 1 here would actually reduce the code coverage.

As far as I can tell, reducing the number of queues would be problematic in a high performance networking scenario but doesn't matter for fuzzing in a namespace with only one process so this seems like a fair trade-off to me. Ultimately, this lets me start a lot more parallel executors and take better advantage of my beefy machine.

Technical detail for review: veth interfaces actually create two interfaces for both side of the virtual ethernet link so both sides need to be configured with a low number of queues.

I observed that on machines with many CPUs (480 on my setup), fuzzing with a handful of procs (8 on my setup) would consistently fail to start because syz-executors would fail to respond within the default handshake timeout of 1 minute. Reducing procs to 4 would fix it but sounds ridiculous on such a powerful machine. As part of the default sandbox policy, a syz-executor creates a large number of virtual network interfaces (16 on my kernel config, probably more on other kernels). This step vastly dominates the executor startup time and was clearly responsible for the timeout I observed that prevented me from fuzzing. When fuzzing or reproducing with procs > 1, all executors run their sandbox setup in parallel. Creating network interfaces is done by socket operations to the RTNL (routing netlink) subsystem. Unfortunately, all RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock so instead of paralellizing the 8*16 interfaces creation, they effectively get serialized and the timing it takes to set up the default sandbox for one executor scales lineraly with the number of executors started "in parallel". This is currently inherent to the rtnl_mutex in the kernel and as far as I can tell there's nothing we can do about it. However, it makes it very important that each critical section guarded by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock. Unfortunately, the default behavior of a virtual network interface creation is to create one RX and one TX queue per CPU. Each queue is associated with a sysfs file whose creation is quite slow and goes through various sanitized paths that take a long time. This means that each critical section scales linearly to the number of CPUs on the host. For example, in my setup, starting fuzzing needs 2 minutes 25. I found that I could bring this down to 10 seconds (15x faster startup time!) by limiting the number of RX and TX queues created per virtual interface to 2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically chose 2 to try and keep coverage of the code that exercises multiple queues but I don't have evidences that choosing 1 here would actually reduce the code coverage. As far as I can tell, reducing the number of queues would be problematic in a high performance networking scenario but doesn't matter for fuzzing in a namespace with only one process so this seems like a fair trade-off to me. Ultimately, this lets me start a lot more parallel executors and take better advantage of my beefy machine. Technical detail for review: veth interfaces actually create two interfaces for both side of the virtual ethernet link so both sides need to be configured with a low number of queues.

xairy · 2025-11-26T14:51:28Z

(CC #5012, as this PR might partially address issue 2 from it.)

tarasmadan · 2025-11-26T15:19:54Z

I don't have any reason to not LGTM it.
Let's hear what @a-nogikh and @dvyukov think about it. There is a chance they can talk about the alternatives.

And for this kind of changes some syz-testbed run may help to make the actual decision...

a-nogikh

Thank you!

a-nogikh · 2025-11-26T15:49:24Z

I suggest that we deploy it and check coverage reports during the following day(s) for possible coverage regressions. Though I think it's unlikely that there will be any.

a-nogikh approved these changes Nov 26, 2025

View reviewed changes

a-nogikh added this pull request to the merge queue Nov 26, 2025

Merged via the queue into google:master with commit e833134 Nov 26, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: improve startup time on machines with many CPUs #6482

executor: improve startup time on machines with many CPUs #6482

Uh oh!

FlorentRevest commented Nov 26, 2025

Uh oh!

xairy commented Nov 26, 2025 •

edited

Loading

Uh oh!

tarasmadan commented Nov 26, 2025

Uh oh!

a-nogikh left a comment

Uh oh!

a-nogikh commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

executor: improve startup time on machines with many CPUs #6482

executor: improve startup time on machines with many CPUs #6482

Uh oh!

Conversation

FlorentRevest commented Nov 26, 2025

Uh oh!

xairy commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarasmadan commented Nov 26, 2025

Uh oh!

a-nogikh left a comment

Choose a reason for hiding this comment

Uh oh!

a-nogikh commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xairy commented Nov 26, 2025 •

edited

Loading