Skip to content

optimize process collection with io_uring (18-46x faster)#1544

Closed
markg85 wants to merge 1 commit intoaristocratos:mainfrom
markg85:main
Closed

optimize process collection with io_uring (18-46x faster)#1544
markg85 wants to merge 1 commit intoaristocratos:mainfrom
markg85:main

Conversation

@markg85
Copy link
Copy Markdown

@markg85 markg85 commented Feb 16, 2026

Optimize Linux process monitoring with io_uring batched I/O and eliminate redundant string operations. Achieves 18-46x speedup in collection cycles.

Performance improvements:

  • Collection cycle time: ~4.6ms -> ~0.10-0.25ms (18-46x faster)
  • Syscalls per cycle: ~400+ -> 1-2 (200x reduction via io_uring batching)
  • CPU usage: ~15-20% -> ~7% for 400 processes
  • Context switches/sec: ~100+ -> ~20 (5x reduction)

Key changes:

io_uring integration (Linux):

  • Use io_uring for batched async reads of /proc/[pid]/stat files
  • Enable SQPOLL mode for zero-syscall submission when available
  • Keep persistent file descriptors open across collection cycles
  • Fall back through optimized modes if SQPOLL unavailable

Fast directory enumeration:

  • Replace fs::directory_iterator with direct getdents64 syscall
  • Eliminates C++ filesystem overhead for /proc scanning

String allocation optimizations:

  • Add count_digits() for fast digit counting without string allocation
  • Replace to_string(rss).size() with count_digits(rss)
  • Move pid_path construction inside no_cache block (only for new processes)
  • Cache totalMem_len value with invalidation on memory change

Zero-copy parsing:

  • Add parse_stat_buffer() for direct buffer parsing
  • Parse stat fields without intermediate string objects

Resource management:

  • Add Proc::cleanup() for proper io_uring and FD cleanup on exit
  • Add cleanup stubs for FreeBSD, NetBSD, OpenBSD, macOS

Build changes:

  • Add -luring to LDFLAGS for io_uring support

Hi,

I've been working on this optimization in the past without AI for weeks. Why? Well, just curious about what i can do :)
Personally i went as far as getdents64 and lots of string optimizations and ended there. It was faster, sure, but "only" by 10x or so at most. Syscalls were becoming the main bottleneck at that point.

Then i used AI (GLM 5 model) and this is the result of that.

This code now is about ~60% kernel time with the remaining spread all over the place. You can go much faster still but then you are in kernel modules. Specifically eBPF. Yeah, not going that far :)

In the current patch set going the io_uring route on it's own is still not much faster. In fact, it's slower for this usecase when using the novice approach. It's when you add in the use of SQPOLL where performance shoots up from "just" 10x faster to nearly 50x faster. The string and number parsing in the kernel itself (for the procfs files) are now the bottleneck. You're still dealing with strings even though they start as numbers and need to end as numbers too.

With this patch you can run btop at 100ms interval and it itself won't (or will hardly) boil up to the top cpu user. More importantly, it itself won't cause the cpu to scale up/down in frequency much or anything. When i'm running a process monitor tool, that's what i want. A tool that shows me what uses CPU but not the tool itself using so much that it becomes the top user.

Anyhow, let me know of changes you'd like to see to merge this.

…ocations

Optimize Linux process monitoring with io_uring batched I/O and eliminate
redundant string operations. Achieves 18-46x speedup in collection cycles.

Performance improvements:
- Collection cycle time: ~4.6ms -> ~0.10-0.25ms (18-46x faster)
- Syscalls per cycle: ~400+ -> 1-2 (200x reduction via io_uring batching)
- CPU usage: ~15-20% -> ~7% for 400 processes
- Context switches/sec: ~100+ -> ~20 (5x reduction)

Key changes:

io_uring integration (Linux):
- Use io_uring for batched async reads of /proc/[pid]/stat files
- Enable SQPOLL mode for zero-syscall submission when available
- Keep persistent file descriptors open across collection cycles
- Fall back through optimized modes if SQPOLL unavailable

Fast directory enumeration:
- Replace fs::directory_iterator with direct getdents64 syscall
- Eliminates C++ filesystem overhead for /proc scanning

String allocation optimizations:
- Add count_digits() for fast digit counting without string allocation
- Replace to_string(rss).size() with count_digits(rss)
- Move pid_path construction inside no_cache block (only for new processes)
- Cache totalMem_len value with invalidation on memory change

Zero-copy parsing:
- Add parse_stat_buffer() for direct buffer parsing
- Parse stat fields without intermediate string objects

Resource management:
- Add Proc::cleanup() for proper io_uring and FD cleanup on exit
- Add cleanup stubs for FreeBSD, NetBSD, OpenBSD, macOS

Build changes:
- Add -luring to LDFLAGS for io_uring support
@deckstose deckstose added the AI generated Majority of included code is AI generated label Feb 18, 2026
@deckstose
Copy link
Copy Markdown
Collaborator

How did you measure the performance difference?

@markg85
Copy link
Copy Markdown
Author

markg85 commented Feb 23, 2026

Hi @deckstose, sorry for the late reply!

Initially i measures by pulling out the code from btop and testing it in a stand-alone application where i could benchmark it without starting up the btop gui all the time. Once that started looking promising i merged it back in and did profiling based on time. I did keep an eye on the number of measured processes and that all stayed within a few differences.

That all being said, there is a bug in this path.

I don't know why yet but for some reason the btop process itself wasn't showing up in btop (it was when running a second instance). As if it had i would've surely noticed that this patch, while making it much faster, also does busy waiting on 1 core (SQPOLL) which has 100% usage on that core. Yeah, that's absolutely not what i intent and goes counter to the patch idea. That being a resource monitor that itself doesn't use much and should ot most be only a blip in the monitoring.

So don't merge till i have figured out a way to keep the improvements but drop the busy waiting (or reduce it so much that it's effectively gone).

@deckstose deckstose marked this pull request as draft February 27, 2026 16:36
@aristocratos
Copy link
Copy Markdown
Owner

aristocratos commented Feb 28, 2026

@markg85
To have any chance of being merged when/if fixed, this need to be optional, and probably not on by default (or compile time check if kernel has io_uring support).

Dropping support for all linux kernels before 5.1 (when io_uring was introduced) is not gonna happen.

There is also some really weird stuff in here, like the count_digits function.
Instead of writing custom functions for a comparison, look at why the comparison is done (RSS being weird). In which cases does this happen? Does this still happen in kernel 5.1>?

And a lot of C style code which should be adopted to C++ style (use of std::ranges and so on).

@aristocratos
Copy link
Copy Markdown
Owner

There is also legitimate security concerns using io_uring, what steps has gone in to mitigating possible security issues?

https://www.upwind.io/feed/io_uring-linux-performance-boost-or-security-headache

@markg85
Copy link
Copy Markdown
Author

markg85 commented Feb 28, 2026

There is also legitimate security concerns using io_uring, what steps has gone in to mitigating possible security issues?

https://www.upwind.io/feed/io_uring-linux-performance-boost-or-security-headache

Your line of questioning makes it sound like you're looking for an excuse to not accept it.
If accepted at all, it's optional and disabled is telling enough. I won't even pursue improvements if you maintain that stance.

Then you throw in the security argument. Like actually, what the F...
So now you expect me to research io_uring security and then also tell you how this is mitigated to satisfy... what exactly?

But fine, I'll entertain this questionable question. As you should know, if you had read that same link, is that using io_uring "bypasses" syscall traces (or EDR bypass) for the common actions otherwise performed through syscalls. In other words, (security) tools that rely on syscall monitoring don't work when io_uring is used. And if a malicious program does that then some harm can be done undetected. I'm assuming we don't see btop as malicious here so i'm putting that classification on the side. What's left is the mitigation. There's an important distinction to make here. If you don't make syscalls from userland code then you can't detect them from userland code either. This use of io_uring does exactly that. What you can detect is that io_uring is being used, not what's being done with it from userland. I'm not aware of an io_uring mitigation and frankly i don't care about it either. I'm not a security researcher and i'm not the io_uring developer either. I want to have things as performant as it can be. With system monitoring tools without going into kernel space programs is async IO and as little syscalls as possible. io_uring fits that bill very nicely. I think more programs should use this as it's a wonderful technique!

The security researches will have a harder time with this. They will likely have to go the kernel route like also eBPF. In my search for this EDR bypass i found (well, it's also in the article you linked) that eBPF can be used to get kernel traces and get a clear picture that way of what io_uring is doing. I didn't even needed to know, it's simply irrelevant, but that should be your answer.

Now to conclude this. I'm dropping this patch entirely if it can't be enabled by default. I'm fine if we can come to some consensus and say that the next major version will have it but I'm not going to even bother if it's behind a switch and disabled by default. I hate needlessly spending time on something just to be "politely rejected" over the course of months and lots of back and forth. So i rather have you just plain and simple say:

  1. Fixup the improvements to our coding standards and it can be merged for release x?
    or
  2. Don't bother, not gonna happen.

If you opt for option 1 i'll happily make a new patch within a few days that is more in line with btop coding standards and explains weird code oddities like count_digits. But if you're going to reject it anyhow then i'll obviously won't put any more effort in it.

@aristocratos
Copy link
Copy Markdown
Owner

Your line of questioning makes it sound like you're looking for an excuse to not accept it.
If accepted at all, it's optional and disabled is telling enough. I won't even pursue improvements if you maintain that stance.

See my edit in the previous comment:
"and probably not on by default (or compile time check if kernel has io_uring support)."

Then you throw in the security argument. Like actually, what the F...
So now you expect me to research io_uring security and then also tell you how this is mitigated to satisfy... what exactly?

Well yeah, when you submit AI generated code it's obviously gonna have a higher level of scrutiny, even more so when it currently (as you mentioned) is bugged.

And yes, I absolutely would expect you to research the security of new functionality you are introducing in to the codebase.
It's a bit alarming that you don't think so...

A bug in code that are to some degree obfuscating what is happening is pretty critical if it for example could be triggered from external input, especially since there is a good chance btop is running with heightened privileges.

[...] Now to conclude this. I'm dropping this patch entirely if it can't be enabled by default.

It's not gonna work in the musl builds which are shipped with the releases, since they support kernel 2.6 and up.
(Don't know if building with musl could be an issue also considering CI failing for all those builds currently).

But why the ultimatum?
It has to be the default or you are dropping the PR?

If I was a more conspiracy minded person that would make me somewhat suspicious of your intentions.

I'm usually enthused to see a performance related PR's, but seeing spectacular claims and "AI" in the same sentence makes me more skeptical. (Which might just be me becoming more jaded...)

I hate needlessly spending time on something just to be "politely rejected" over the course of months and lots of back and forth

I can't promise this code (fixed) would be merged, that would depend on the code which I haven't seen yet. But if it gets to a good state and your performance gain holds, of course I would be happy to merge your code.

But if your aim is just to get code accepted and in to a release as fast as possible, I'm sorry, this is not the project for that.

Completely changing one of the most critical code paths in btop is gonna require testing, prodding, fuzzing and "back and forth".

If you're not up for that, I understand and thank you for the effort so far.

@markg85
Copy link
Copy Markdown
Author

markg85 commented Feb 28, 2026

And yes, I absolutely would expect you to research the security of new functionality you are introducing in to the codebase.
It's a bit alarming that you don't think so...

That's ludicrous. Do you check the latest compiler bugs for using the stringstream functionality? Do you make sure that the files you read into memory aren't malicious? io_uring is quite popular and is well tested and used. Using it should not highlight security related flags. It should not put the burden on me to somehow proof it's safe to use, which is the exact thing you do ask of me.

What you ask is right if i were to ask for the inclusion of some obscure or brand new library that is hardly used. That would make me frown too and question the motives. But that's not the case, we're talking about io_uring here in it's purest form.

But why the ultimatum?
It has to be the default or you are dropping the PR?

If I was a more conspiracy minded person that would make me somewhat suspicious of your intentions.

I'm usually enthused to see a performance related PR's, but seeing spectacular claims and "AI" in the same sentence makes me more skeptical. (Which might just be me becoming more jaded...)

I think you misread my intent here. You said this patch, in ideal conditions, would be optional and disabled by default. To which I replied that I'm not even going to bother working on it if it conceptually cannot be upstreamed or can but disabled by default.

There is no deadline or timeline. I'm more then fine to iterate on this and get it in a good shape to merge. If that is in 1 day, awesome. If it's in half a year, great too. There is no pressure on my end. It's your message that "hinted" quite clearly at leaning towards rejecting that rubbed me the wrong way.

I can't promise this code (fixed) would be merged, that would depend on the code which I haven't seen yet. But if it gets to a good state and your performance gain holds, of course I would be happy to merge your code.

That is totally just! Keep that up because that's what makes btop a qualitative good project!

But if your aim is just to get code accepted and in to a release as fast as possible, I'm sorry, this is not the project for that.

Nope, not my intent. I want to optimize it and if that gets merged. Sweet! If it doesn't, fine too.

Completely changing one of the most critical code paths in btop is gonna require testing, prodding, fuzzing and "back and forth".

Now you're talking about 2 different things. Earlier you threw it on security research which i found questionable and bordering offensive as i deem it irrelevant for the given reasons. Testing and making sure that this works and doesn't break anything. Yes, of course! You can expect that for sure!

As i said before, I'm happy to go back and forth to improve this patch. I expect nothing less due to it's fundamental nature and invasive changes. But surely you understand my view that putting work in something that is disabled by default is not my intent. It's good of you to be skeptical as you never know what kind of sneaky ways malicious parties pull to get a backdoor in code. But i can assure you that there is no sneaky intent other then passionately optimizing the hell out of it. If you think this can only live in btop as disabled by default then i see no value in this.

@aristocratos
Copy link
Copy Markdown
Owner

And yes, I absolutely would expect you to research the security of new functionality you are introducing in to the codebase.
It's a bit alarming that you don't think so...

That's ludicrous. Do you check the latest compiler bugs for using the stringstream functionality? Do you make sure that the files you read into memory aren't malicious? io_uring is quite popular and is well tested and used. Using it should not highlight security related flags. It should not put the burden on me to somehow proof it's safe to use, which is the exact thing you do ask of me.

Well, disregarding that stringstream is a part of the C++ standard library and doesn't obfuscate what the binary is doing at runtime, stringstream doesn't have a bunch of CVE's attached to it.

Having AI generate code for a kernel interface with a history of security vulnerabilities and not consider the security would be an issue in my opinion.

Now you're talking about 2 different things. Earlier you threw it on security research which i found questionable and bordering offensive as i deem it irrelevant for the given reasons. Testing and making sure that this works and doesn't break anything. Yes, of course! You can expect that for sure!

It's not 2 different things though, making sure nothing breaks is one part of making it secure.

Regarding security research, an example:
Kernel 5.17.3 and below has https://www.cvedetails.com/cve/CVE-2022-29582/, will this impact this code? And if so should io_uring be enabled for only kernel 5.17.4 and up or are there other mitigations that can be done in the code to avoid this?

And are there more CVE's like that, that would impact this code?

I'm curious why you consider this "bordering offensive" and irrelevant?

If you think this can only live in btop as disabled by default then i see no value in this.

As I mentioned above "compile time check if kernel has io_uring support", would mean that it's enabled by default if there is support for it. (This could be as simple as a macro guard for a specific kernel version or range of versions.)

@markg85
Copy link
Copy Markdown
Author

markg85 commented Mar 1, 2026

Well, disregarding that stringstream is a part of the C++ standard library and doesn't obfuscate what the binary is doing at runtime, stringstream doesn't have a bunch of CVE's attached to it.

The point was the library implementing it, not it being in the standard. It has different implementations, what are the risks there?

Or another think about libraries. Some people call musl really really bad and refuse to ever use Alpine Linux because of it. Especially that library has some specific choices that are not great. It's default memory allocator is a cause of slowdown in the past too, google it. And it definitely has had high risk issues too. Not many, thankfully but it had it till very recently.

So now you're changing stance where it can be a compile time check depending on the kernel version. And I'll have to do my CVE due diligence before you consider it. By that metric it will never get accepted as the latest one is fairly recent, does it have an effect? Probably not.

I'm retracting my PR (closing it). I have no interest in going over CVE's years back to determine if it's still a risk. Or even to qualify what "risk" means. Like risk could mean the app crashes, it could mean "potentially open for malicious exploitation" or it could mean absolutely nothing. At some point the risk analysis is just academic nonsense that is wasting time. Like my phone could explode in my hands mutilating me forever, it's a non-0 potential risk and it's a risk every time i touch it and always there. But the risk is so incredible insignificant (in CVE terms it would be a high classification) that it's just not relevant. Or back to io_uring and the CVE you found, it's an exercise in academic hypothetical issues that i absolutely do not even want to begin to explore. Why would i dive into access permission issues of linux namespaces? If you have that as issue then btop will be the least of your problems. If you put the bar that high for btop then it sure must be an amazingly well fort-know styled secured application, good for you and good for btop.

Closed.

@markg85 markg85 closed this Mar 1, 2026
@aristocratos
Copy link
Copy Markdown
Owner

So now you're changing stance where it can be a compile time check depending on the kernel version.

Not really, considering I mentioned it 3 times now, including in my original comment...

I have no interest in going over CVE's years back to determine if it's still a risk. Or even to qualify what "risk" means.

Nobody asked you too, merely finding the latest "actual" high risk CVE (if there were any) and putting the cut-off point for that kernel would have been enough.

If you put the bar that high for btop then it sure must be an amazingly well fort-know styled secured application, good for you and good for btop.

Yeah, sorry if me asking you questions about the security aspects of the PR offended you for some reason...

I'm retracting my PR (closing it).

That's probably for the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI generated Majority of included code is AI generated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants