-
Notifications
You must be signed in to change notification settings - Fork 891
xdp: move init steps that require caps early in validator process start up #9133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #9133 +/- ##
=========================================
- Coverage 82.6% 82.6% -0.1%
=========================================
Files 852 852
Lines 318162 318341 +179
=========================================
+ Hits 262963 262991 +28
- Misses 55199 55350 +151 🚀 New features to boost your workflow:
|
b5b9f5d to
af2baa0
Compare
|
ok this is ready so far as i'm concerned. definitely recommend per-commit review. feel free to suggest a different batching for PRing the commits |
|
done |
|
please stop with the irrelevant pedantic shit. look at the pr title. that's what i'm trying to do here. if you can't stay focused i don't want your review |
there is no more pedantic shit to point out so ill focus on the code now, but it shouldnt hurt to fix typos etc at the final point 👍 |
gregcusack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from a technical point of view. a few nits. but looks good. alessandro may have some comments on keeping the xdp/ generic enough for aya. thanks for doing this!
this is what we get for merging features without cleaning up our mess first |
|
i'll give @alessandrod 'til eod tomorrow to take a look, then merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request moves XDP initialization steps that require elevated Linux capabilities to happen early in the validator process startup, before any threads are spawned. This prevents capability inheritance by child threads, reducing the attack surface and security risk.
Changes:
- Refactored XDP initialization into a builder pattern that separates capability-requiring operations from runtime operations
- Moved XDP initialization to happen immediately after node creation and before validator thread spawning
- Added proper capability management including clearing inheritable and permitted capability sets after initialization
Reviewed changes
Copilot reviewed 11 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| xdp/src/umem.rs | Added OwnedUmem type for memory ownership and moved methods to traits for better abstraction |
| xdp/src/tx_loop.rs | Refactored tx_loop into builder pattern (TxLoopBuilder) and runtime execution (TxLoop) |
| xdp/src/device.rs | Added Send trait implementations for thread-safe types with safety documentation |
| xdp/Cargo.toml | Removed caps dependency (moved to turbine) |
| validator/src/commands/run/execute.rs | Moved XDP initialization to early startup before thread spawning |
| validator/src/admin_rpc_service.rs | Updated test to pass new None parameter to Validator::new |
| turbine/src/xdp.rs | Split XdpRetransmitter into XdpRetransmitBuilder (caps phase) and XdpRetransmitter (runtime) |
| turbine/Cargo.toml | Added aya dependency |
| test-validator/src/lib.rs | Updated test to pass new None parameter |
| local-cluster/src/local_cluster.rs | Updated tests to pass new None parameter |
| core/src/validator.rs | Added maybe_partial_xdp_retransmitter parameter to Validator::new |
| Cargo.lock, dev-bins/Cargo.lock, programs/sbf/Cargo.lock | Updated dependency graph reflecting caps movement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
alessandrod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this work!
It was SO EASY to review since the changes are so nicely split in atomic commits. We really need to start doing this more it saves so much time.
alexpyattaev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to drop the caps no matter if XDP retransmit is enabled or not. It is likely that operators will have the binary marked with extra capabilities (setting caps always in build/startup scripts), and if they do not enable XDP retransmit for whatever reason, the capabilities will not get dropped.
As a secondary layer of protection, we may also set PR_SET_NO_NEW_PRIVS to be certain (should probably be another PR though).
75f8b96 to
0822766
Compare
|
ok addressed all the relevant review comments. ci should be good to go |
xdp/src/tx_loop.rs
Outdated
| // since we aren't necessarily allocating from the thread that we intend to run on, | ||
| // temporarily switch to the target cpu to ensure that the region is allocated to the | ||
| // correct numa node | ||
| set_cpu_affinity([cpu_id]).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can possibly numa(7) our way out of this later. i only skimmed, but i think you can induce migration after the fact
alessandrod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, one last round trip
| .cloned() | ||
| .collect::<HashSet<_>>(); | ||
| if !reserved.is_empty() { | ||
| let available = core_affinity::get_core_ids() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it turns outtt... this returns the current affinity mask, so we have to call it
before the block above, or this ends up doing set_difference([i], [..., i]) = [] and
the validator doesn't start
I've confirmed that if I move this call above the block then the validator
starts and XDP seems to work correctly
t-nelson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok took care of the moving the main thread back to its initial cpu
|
|
||
| // try to allocate huge pages first, then fall back to regular pages | ||
| const HUGE_2MB: usize = 2 * 1024 * 1024; | ||
| set_cpu_affinity([cpu_id]).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the first refactor, we're still constructing the TxLoopBuilder on the target thread. set cpu affinity here, even though we'll do it again below in TxLoop::new
turbine/src/xdp.rs
Outdated
| // since we aren't necessarily allocating from the thread that we intend to run on, | ||
| // temporarily switch to the target cpu for each TxLoop to ensure that the Umem region | ||
| // is allocated to the correct numa node | ||
| let this_cpu = get_cpu().expect("linux implements sched_getcpu"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next we pull TxLoopBuilder init out to main thread. stash initial main thread cpu and restore it once all of the tx loops have been started
| // since we aren't necessarily allocating from the thread that we intend to run on, | ||
| // temporarily switch to the target cpu for each TxLoop to ensure that the Umem region | ||
| // is allocated to the correct numa node | ||
| let this_cpu = get_cpu().expect("linux implements sched_getcpu"); | ||
| let tx_loop_builders = cpus | ||
| .into_iter() | ||
| .zip(std::iter::repeat_with(|| tx_loop_config.clone())) | ||
| .enumerate() | ||
| .map(|(i, (cpu_id, config))| { | ||
| TxLoopBuilder::new(cpu_id, QueueId(i as u64), config, &dev) | ||
| }) | ||
| .collect::<Vec<_>>(); | ||
| // migrate main thread back off of the last xdp reserved cpu | ||
| set_cpu_affinity([this_cpu]).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retain the main thread cpu restoration after isolating the logic that requires caps being raised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_cpu_affinity([this_cpu]) seems incorrect? this cpu != all the possible cpus in the original mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't applied the masked yet. i moved that back
Problem
we do the xdp init steps that requires capabilities quite late in validator process start up. we've already spawned many threads, which will have inherited caps that they do not need, posing security risk
Summary of Changes
Fixes: #6087
i know the new type names suck. suggestions welcome