Skip to content

Commit 78bf3e4

Browse files
avi-starkwareclaude
andcommitted
serialize concurrent profiling, propagate missing-symbol as error
Two correctness fixes to run_with_libfunc_profile, both inherited from the source branch (lambdaclass/sequencer:libfunc_profiling_support): 1. Race condition. The trace_id global symbol that drives sample routing is process-wide; concurrent calls non-atomically write to the same address, and the second caller's ProfilerGuard clobbers the first's saved old_trace_id. Add a process-wide PROFILE_LOCK (Mutex<()>) and hold it for the entire call. Lock poisoning is recovered — the protected state is the symbol itself, and inner_into() is safe to reuse. 2. Error paths. find_symbol_ptr().unwrap() panicked when the .so was compiled without libfunc-profiling instrumentation; the LIBFUNC_PROFILE slot inserted earlier was leaked. Reorder: look up the symbol first, convert None to Error::UnexpectedValue, then insert the slot. The trailing drain now only invokes on_profile when the inner run succeeded — a partial profile from an aborted run isn't meaningful. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4926e89 commit 78bf3e4

2 files changed

Lines changed: 72 additions & 34 deletions

File tree

src/executor.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
//! This module provides methods to execute the programs, either via JIT or compiled ahead
44
//! of time. It also provides a cache to avoid recompiling previously compiled programs.
55
6+
#[cfg(feature = "with-libfunc-profiling")]
7+
pub use self::contract_executor::AotWithProgram;
8+
#[cfg(feature = "sierra-emu")]
9+
pub use self::contract_executor::EmuContractInfo;
610
pub use self::{
711
aot::AotNativeExecutor, contract::AotContractExecutor, contract_executor::ContractExecutor,
812
jit::JitNativeExecutor,
913
};
10-
#[cfg(feature = "sierra-emu")]
11-
pub use self::contract_executor::EmuContractInfo;
12-
#[cfg(feature = "with-libfunc-profiling")]
13-
pub use self::contract_executor::AotWithProgram;
1414
use crate::{
1515
arch::{AbiArgument, ValueWithInfoWrapper},
1616
error::{panic::ToNativeAssertError, Error},

src/executor/libfunc_profile.rs

Lines changed: 68 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,40 +4,48 @@
44
//! declaration in `src/executor.rs`).
55
66
use std::sync::atomic::{AtomicU64, Ordering};
7-
use std::sync::Arc;
7+
use std::sync::{Arc, Mutex};
88

99
use cairo_lang_sierra::program::Program;
1010
use starknet_types_core::felt::Felt;
1111

12-
use crate::error::Result;
12+
use crate::error::{Error, Result};
1313
use crate::execution_result::ContractExecutionResult;
1414
use crate::executor::AotContractExecutor;
1515
use crate::metadata::profiler::{Profile, ProfilerBinding, ProfilerImpl, LIBFUNC_PROFILE};
1616
use crate::starknet::StarknetSyscallHandler;
1717
use crate::utils::BuiltinCosts;
1818

19+
/// Process-wide lock that serializes calls into [`AotContractExecutor::run_with_libfunc_profile`].
20+
/// The profiler hot-swaps a process-global symbol (`cairo_native__profiler__profile_id`);
21+
/// concurrent callers would race on that write and on the [`LIBFUNC_PROFILE`] slot bookkeeping.
22+
static PROFILE_LOCK: Mutex<()> = Mutex::new(());
23+
1924
impl AotContractExecutor {
2025
/// Run the entrypoint with libfunc-level profiling instrumentation.
2126
///
2227
/// Wraps [`AotContractExecutor::run`] with the bookkeeping the
2328
/// `with-libfunc-profiling` runtime needs:
2429
///
25-
/// 1. Allocates a unique trace ID and inserts an empty `ProfilerImpl` slot in
26-
/// [`LIBFUNC_PROFILE`].
27-
/// 2. Points the executor's `cairo_native__profiler__profile_id` symbol at the new
28-
/// trace ID, saving the previous value.
29-
/// 3. Calls `run`. Per-statement samples accumulate in the slot via the runtime
30+
/// 1. Acquires [`PROFILE_LOCK`] so concurrent profile calls serialize on the
31+
/// global trace-id symbol. The lock is recovered if poisoned.
32+
/// 2. Looks up the executor's `cairo_native__profiler__profile_id` symbol. If
33+
/// absent (the .so was compiled without profiling instrumentation) the call
34+
/// returns an error before touching any global state.
35+
/// 3. Allocates a unique trace ID and inserts an empty `ProfilerImpl` slot in
36+
/// [`LIBFUNC_PROFILE`]; points the profile-id symbol at the new ID, saving
37+
/// the previous value.
38+
/// 4. Calls `run`. Per-statement samples accumulate in the slot via the runtime
3039
/// `push_stmt` callback.
31-
/// 4. Drains the slot, calls [`ProfilerImpl::get_profile`] with `program`, and hands
32-
/// the resulting [`Profile`] to `on_profile`.
33-
/// 5. A [`ProfilerGuard`] restores the previous trace ID — and removes the slot if
34-
/// the success path didn't — on both success and unwind paths.
40+
/// 5. Drains the slot. On success (and only on success) hands the resulting
41+
/// [`Profile`] to `on_profile`; on failure the callback is not invoked
42+
/// (partial profiles aren't meaningful).
43+
/// 6. A [`ProfilerGuard`] restores the previous trace ID and clears the slot on
44+
/// both the success and unwind paths.
3545
///
3646
/// `program` must be the Sierra program this executor was compiled from; it's used
3747
/// by `get_profile` to map runtime libfunc IDs back to declarations.
38-
///
39-
/// Profiling is intended to run single-threaded; concurrent calls would race on the
40-
/// global `trace_id` symbol.
48+
#[allow(clippy::too_many_arguments)]
4149
pub fn run_with_libfunc_profile<H, F>(
4250
&self,
4351
program: &Arc<Program>,
@@ -52,28 +60,42 @@ impl AotContractExecutor {
5260
H: StarknetSyscallHandler,
5361
F: FnOnce(Profile),
5462
{
63+
// Serialize against concurrent profile calls. Recover from a poisoned lock —
64+
// we don't have invariants on the protected state itself; the lock only gates
65+
// access to the global trace-id symbol.
66+
let _profile_lock = PROFILE_LOCK.lock().unwrap_or_else(|e| e.into_inner());
67+
68+
// Look up the profile-id symbol before touching any global state. If the
69+
// executor wasn't compiled with libfunc-profiling instrumentation, the
70+
// symbol is absent — return a typed error rather than panicking.
71+
let trace_id_ptr = self
72+
.find_symbol_ptr(ProfilerBinding::ProfileId.symbol())
73+
.ok_or_else(|| {
74+
Error::UnexpectedValue(format!(
75+
"AOT executor missing libfunc-profiling symbol `{}`; \
76+
was the program compiled with libfunc-profiling enabled?",
77+
ProfilerBinding::ProfileId.symbol()
78+
))
79+
})?
80+
.cast::<u64>();
81+
5582
static COUNTER: AtomicU64 = AtomicU64::new(0);
5683
let counter = COUNTER.fetch_add(1, Ordering::Relaxed);
5784

5885
LIBFUNC_PROFILE
5986
.lock()
60-
.unwrap()
87+
.unwrap_or_else(|e| e.into_inner())
6188
.insert(counter, ProfilerImpl::new());
6289

63-
// The pointer targets a global symbol in the executor's shared library; it lives
64-
// for the executor's lifetime. Single-threaded profiling means no concurrent writer.
65-
let trace_id_ptr = self
66-
.find_symbol_ptr(ProfilerBinding::ProfileId.symbol())
67-
.unwrap()
68-
.cast::<u64>();
69-
// SAFETY: see above. Read/write to a non-null, properly-aligned `*mut u64`.
90+
// SAFETY: the pointer targets a memref-global emitted into the executor's
91+
// shared library; the executor outlives the call. `PROFILE_LOCK` serializes
92+
// us against any other writer, and the JIT/AOT code reads through the same
93+
// address. Reads/writes are aligned `u64`s.
7094
let old_trace_id = unsafe { *trace_id_ptr };
7195
unsafe {
7296
*trace_id_ptr = counter;
7397
}
7498

75-
// Restore on the success path AND on unwind. On success the caller drains the
76-
// slot below; the guard's `remove` is then a no-op.
7799
let _guard = ProfilerGuard {
78100
trace_id_ptr,
79101
old_trace_id,
@@ -82,15 +104,27 @@ impl AotContractExecutor {
82104

83105
let result = self.run(selector, args, gas, builtin_costs, syscall_handler);
84106

85-
let profiler = LIBFUNC_PROFILE.lock().unwrap().remove(&counter).unwrap();
86-
on_profile(profiler.get_profile(program));
107+
// Drain the slot. `ProfilerGuard::drop` would also remove it; doing it here
108+
// means we hold the lock for the shortest time and can hand the profile to
109+
// the callback. Tolerate a poisoned mutex (we'd lose the profile, not state).
110+
let drained = LIBFUNC_PROFILE
111+
.lock()
112+
.unwrap_or_else(|e| e.into_inner())
113+
.remove(&counter);
114+
115+
// Only call the user's callback when `run` succeeded — a partial profile
116+
// captured against an aborted execution wouldn't be meaningful.
117+
if let (Some(profiler), Ok(_)) = (drained, &result) {
118+
on_profile(profiler.get_profile(program));
119+
}
87120

88121
result
89122
}
90123
}
91124

92-
/// RAII cleanup for the profiler globals. Restores `*trace_id_ptr` and drops the
93-
/// `LIBFUNC_PROFILE` slot at `counter` if it's still occupied.
125+
/// RAII cleanup for the profiler globals. Restores `*trace_id_ptr` on success or
126+
/// unwind. The [`LIBFUNC_PROFILE`] slot at `counter` is normally drained on the
127+
/// success path; this guard removes it if it's still occupied (panic case).
94128
struct ProfilerGuard {
95129
trace_id_ptr: *mut u64,
96130
old_trace_id: u64,
@@ -99,11 +133,15 @@ struct ProfilerGuard {
99133

100134
impl Drop for ProfilerGuard {
101135
fn drop(&mut self) {
102-
// SAFETY: same provenance as the construction site; single-threaded use.
136+
// SAFETY: same provenance as the construction site. `PROFILE_LOCK` is held
137+
// by the enclosing scope (still in flight while we drop) so no other thread
138+
// races us.
103139
unsafe {
104140
*self.trace_id_ptr = self.old_trace_id;
105141
}
106-
// Tolerate a poisoned mutex silently — Drop must not panic.
142+
// Tolerate a poisoned mutex silently — Drop must not panic. Slot leak on
143+
// poison is intentional and matches the behavior of other Drop impls in
144+
// this crate; the alternative (panic in Drop) is worse.
107145
if let Ok(mut profile) = LIBFUNC_PROFILE.lock() {
108146
profile.remove(&self.counter);
109147
}

0 commit comments

Comments
 (0)