Monitor CPU usage and internally rate limit

## What's wrong?

Especially with the state network, and when onboarding a new node, trin can redline the CPU while accepting offers and re-offering to peers. (I tended to see my fans spinning up when trin reached about 10 offers accepted per second).

Just like with storage, we don't want to overuse CPU. We prefer the client to feel light, and as something that can be constantly running in the background.

## Possible solution

Monitor `cpu_time::ProcessTime` inside trin, and cap it at some small % by default (2%? 5%? 10%?). Add a CLI flag to manually configure it.

Every 10s that the CPU is above the limit, shrink the data radius of the state network by 10% of its current level. (This is making the assumption, based on the current experience, that state is the only offender for high CPU usage). Every 10s that the CPU is below the limit by at least half, and the data storage is under target, then grow the radius by 10%. I think we want to be quite responsive, which this accomplishes by being willing to cut the radius in ~half in 1 minute.

This approach might actually accelerate state nodes finding their natural "true" radius point faster, and mitigate the fill & dump behavior when launching a fresh client (which is more slow and painful on state than history).

## Challenges

trin could use CPU for other reasons, like due to user interaction (ie~ when someone is using the RPC API). It is easy to imagine that CPU usage will spike then, and it would be wrong to mess with state radius at that point. We probably cannot punt on this, and will need to include a solution with the first implementation.

Another awkward aspect of this approach is that it's hard to tell which network is using too much CPU. We probably don't want to adjust every network's radius at once if CPU is high. Right now, it's only ever state, so I think we can punt this challenge. But if multiple networks start using a lot of CPU, we might want a clever way to measure computation that isn't just checking `cpu_time::ProcessTime`.

## Timeline

I won't try to get this into the imminent stable release, of course, just planning ahead. This is another good reason not to enable state by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Monitor CPU usage and internally rate limit #1545

What's wrong?

Possible solution

Challenges

Timeline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Monitor CPU usage and internally rate limit #1545

Description

What's wrong?

Possible solution

Challenges

Timeline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions