Skip to content

Conversation

@FlorentRevest
Copy link
Collaborator

I observed that on machines with many CPUs (480 on my setup), fuzzing with a handful of procs (8 on my setup) would consistently fail to start because syz-executors would fail to respond within the default handshake timeout of 1 minute. Reducing procs to 4 would fix it but sounds ridiculous on such a powerful machine.

As part of the default sandbox policy, a syz-executor creates a large number of virtual network interfaces (16 on my kernel config, probably more on other kernels). This step vastly dominates the executor startup time and was clearly responsible for the timeout I observed that prevented me from fuzzing.

When fuzzing or reproducing with procs > 1, all executors run their sandbox setup in parallel. Creating network interfaces is done by socket operations to the RTNL (routing netlink) subsystem. Unfortunately, all RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock so instead of paralellizing the 8*16 interfaces creation, they effectively get serialized and the timing it takes to set up the default sandbox for one executor scales lineraly with the number of executors started "in parallel". This is currently inherent to the rtnl_mutex in the kernel and as far as I can tell there's nothing we can do about it.

However, it makes it very important that each critical section guarded by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock. Unfortunately, the default behavior of a virtual network interface creation is to create one RX and one TX queue per CPU. Each queue is associated with a sysfs file whose creation is quite slow and goes through various sanitized paths that take a long time. This means that each critical section scales linearly to the number of CPUs on the host.

For example, in my setup, starting fuzzing needs 2 minutes 25. I found that I could bring this down to 10 seconds (15x faster startup time!) by limiting the number of RX and TX queues created per virtual interface to 2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically chose 2 to try and keep coverage of the code that exercises multiple queues but I don't have evidences that choosing 1 here would actually reduce the code coverage.

As far as I can tell, reducing the number of queues would be problematic in a high performance networking scenario but doesn't matter for fuzzing in a namespace with only one process so this seems like a fair trade-off to me. Ultimately, this lets me start a lot more parallel executors and take better advantage of my beefy machine.

Technical detail for review: veth interfaces actually create two interfaces for both side of the virtual ethernet link so both sides need to be configured with a low number of queues.

I observed that on machines with many CPUs (480 on my setup), fuzzing
with a handful of procs (8 on my setup) would consistently fail to start
because syz-executors would fail to respond within the default handshake
timeout of 1 minute. Reducing procs to 4 would fix it but sounds
ridiculous on such a powerful machine.

As part of the default sandbox policy, a syz-executor creates a large
number of virtual network interfaces (16 on my kernel config, probably
more on other kernels). This step vastly dominates the executor startup
time and was clearly responsible for the timeout I observed that
prevented me from fuzzing.

When fuzzing or reproducing with procs > 1, all executors run their
sandbox setup in parallel. Creating network interfaces is done by socket
operations to the RTNL (routing netlink) subsystem. Unfortunately, all
RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock
so instead of paralellizing the 8*16 interfaces creation, they
effectively get serialized and the timing it takes to set up the default
sandbox for one executor scales lineraly with the number of executors
started "in parallel". This is currently inherent to the rtnl_mutex in
the kernel and as far as I can tell there's nothing we can do about it.

However, it makes it very important that each critical section guarded
by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock.
Unfortunately, the default behavior of a virtual network interface
creation is to create one RX and one TX queue per CPU. Each queue is
associated with a sysfs file whose creation is quite slow and goes
through various sanitized paths that take a long time. This means that
each critical section scales linearly to the number of CPUs on the host.

For example, in my setup, starting fuzzing needs 2 minutes 25. I found
that I could bring this down to 10 seconds (15x faster startup time!) by
limiting the number of RX and TX queues created per virtual interface to
2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically
chose 2 to try and keep coverage of the code that exercises multiple
queues but I don't have evidences that choosing 1 here would actually
reduce the code coverage.

As far as I can tell, reducing the number of queues would be problematic
in a high performance networking scenario but doesn't matter for fuzzing
in a namespace with only one process so this seems like a fair trade-off
to me. Ultimately, this lets me start a lot more parallel executors and
take better advantage of my beefy machine.

Technical detail for review: veth interfaces actually create two
interfaces for both side of the virtual ethernet link so both sides
need to be configured with a low number of queues.
@xairy
Copy link
Contributor

xairy commented Nov 26, 2025

(CC #5012, as this PR might partially address issue 2 from it.)

@tarasmadan
Copy link
Collaborator

I don't have any reason to not LGTM it.
Let's hear what @a-nogikh and @dvyukov think about it. There is a chance they can talk about the alternatives.

And for this kind of changes some syz-testbed run may help to make the actual decision...

Copy link
Collaborator

@a-nogikh a-nogikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@a-nogikh
Copy link
Collaborator

I suggest that we deploy it and check coverage reports during the following day(s) for possible coverage regressions. Though I think it's unlikely that there will be any.

@a-nogikh a-nogikh added this pull request to the merge queue Nov 26, 2025
Merged via the queue into google:master with commit e833134 Nov 26, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants