Skip to content

Commit a884600

Browse files
fix(sandbox): prevent state patch loss during block processing (#15534)
## Background `sandbox_patch_state` sometimes permanently loses patches — the patched account never appears, even on retry. This was reported in the [Zulip thread](https://near.zulipchat.com/#narrow/channel/308695-nearone.2Fprivate/topic/Sandbox.20patch_state.20issues) and reproduced with [near-sandbox-rs#51](near/near-sandbox-rs#51). Three interrelated bugs: - **Patch lost on block failure**: `pending_state_patch.take()` drains the queue at the start of block processing. If preprocessing fails (orphan, missing chunks, optimistic deferral), the patch is gone forever. - **RPC returns too early**: `patch_state_in_progress()` checks `is_empty()`, which is true right after `take()` — before the patch is actually committed to state. - **Startup race**: Block 1 after genesis has no new chunks. The runtime's `apply()` early-returns for old chunks and silently drops the `state_patch`. ## What changed Introduces `SandboxPatchTracker` — a type that encapsulates the pending patch, a generation counter, and a committed-generation counter. Like `SandboxStatePatch`, it uses the ZST pattern: real struct on sandbox builds, zero-sized no-op on non-sandbox builds. This replaces the three separate fields on `Chain` (`pending_state_patch`, `sandbox_patch_generation`, `sandbox_patch_committed_gen`) with a single `sandbox_patches: SandboxPatchTracker`. The key design change is **taking the patch late** — at point of use inside the per-shard loop of `apply_chunks_preprocessing`, instead of at the top of `start_process_block_impl`. This eliminates the need for backup/clone/restore on error paths, since any error before the take leaves the patch in the tracker. The generation counter (`in_progress()` checks `generation != committed_gen`, not `is_empty()`) ensures the RPC doesn't return prematurely despite the late take. This removes the `state_patch` parameter from `preprocess_block` and `apply_chunks_preprocessing`, and results in zero `#[cfg(feature = "sandbox")]` blocks in `chain.rs`. ## Related - #15536 — alternative fix by @r-near that addresses the same three bugs with a backup/restore approach and 22 cfg-gates. This PR takes a different approach (take-late + tracker) to achieve the same result with less code deviation. - #14893 — fixed a different patch loss path (`clear()` in `postprocess_ready_block`)
1 parent f0bf399 commit a884600

3 files changed

Lines changed: 114 additions & 34 deletions

File tree

chain/chain/src/block_processing_utils.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ pub(crate) struct BlockPreprocessInfo {
3636
pub(crate) apply_chunks_done_waiter: ApplyChunksDoneWaiter,
3737
/// Used to calculate block processing time metric.
3838
pub(crate) block_start_processing_time: Instant,
39+
/// The sandbox patch generation observed when this block was preprocessed.
40+
/// Used after `chain_update.commit()` to advance the tracker's committed
41+
/// generation. Always 0 on non-sandbox builds.
42+
pub(crate) sandbox_patch_generation: u64,
3943
}
4044

4145
pub(crate) struct OptimisticBlockInfo {

chain/chain/src/chain.rs

Lines changed: 23 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ use near_primitives::optimistic_block::{
6969
BlockToApply, CachedShardUpdateKey, OptimisticBlock, OptimisticBlockKeySource,
7070
};
7171
use near_primitives::receipt::Receipt;
72-
use near_primitives::sandbox::state_patch::SandboxStatePatch;
72+
use near_primitives::sandbox::state_patch::{SandboxPatchTracker, SandboxStatePatch};
7373
use near_primitives::shard_layout::{ShardLayout, ShardUId};
7474
use near_primitives::sharding::{
7575
ChunkHash, ReceiptProof, ShardChunk, ShardChunkHeader, ShardProof, StateSyncInfo,
@@ -302,19 +302,9 @@ pub struct Chain {
302302
/// Prevents re-application of known-to-be-invalid blocks, so that in case of a
303303
/// protocol issue we can recover faster by focusing on correct blocks.
304304
invalid_blocks: LruCache<CryptoHash, ()>,
305-
/// Support for sandbox's patch_state requests.
306-
///
307-
/// Sandbox needs ability to arbitrary modify the state. Blockchains
308-
/// naturally prevent state tampering, so we can't *just* modify data in
309-
/// place in the database. Instead, we will include this "bonus changes" in
310-
/// the next block we'll be processing, keeping them in this field in the
311-
/// meantime.
312-
///
313-
/// Note that without `sandbox` feature enabled, `SandboxStatePatch` is
314-
/// a ZST. All methods of the type are no-ops which behave as if the object
315-
/// was empty and could not hold any records (which it cannot). It's
316-
/// impossible to have non-empty state patch on non-sandbox builds.
317-
pending_state_patch: SandboxStatePatch,
305+
/// Tracks sandbox state patches through the block processing pipeline.
306+
/// On non-sandbox builds this is a ZST and every method is a no-op.
307+
sandbox_patches: SandboxPatchTracker,
318308
/// A callback to initiate state snapshot.
319309
snapshot_callbacks: Option<SnapshotCallbacks>,
320310
/// Manages all tasks related to resharding.
@@ -458,7 +448,7 @@ impl Chain {
458448
last_time_head_updated: clock.now(),
459449
processed_hashes: LruCache::new(NonZeroUsize::new(PROCESSED_HASHES_POOL_SIZE).unwrap()),
460450
invalid_blocks: LruCache::new(NonZeroUsize::new(INVALID_CHUNKS_POOL_SIZE).unwrap()),
461-
pending_state_patch: Default::default(),
451+
sandbox_patches: Default::default(),
462452
snapshot_callbacks: None,
463453
resharding_manager,
464454
validator_signer,
@@ -640,7 +630,7 @@ impl Chain {
640630
memtrie_loading_spawner: memtrie_loading_spawner.into_spawner(),
641631
apply_chunk_results_cache: ApplyChunksResultCache::new(APPLY_CHUNK_RESULTS_CACHE_SIZE),
642632
last_time_head_updated: clock.now(),
643-
pending_state_patch: Default::default(),
633+
sandbox_patches: Default::default(),
644634
snapshot_callbacks,
645635
resharding_manager,
646636
validator_signer,
@@ -1650,14 +1640,12 @@ impl Chain {
16501640

16511641
// 1) preprocess the block where we verify that the block is valid and ready to be processed
16521642
// No chain updates are applied at this step.
1653-
let state_patch = self.pending_state_patch.take();
16541643
let preprocess_timer = metrics::BLOCK_PREPROCESSING_TIME.start_timer();
16551644
let preprocess_res = self.preprocess_block(
16561645
&block,
16571646
&provenance,
16581647
&mut block_processing_artifact.invalid_chunks,
16591648
block_received_time,
1660-
state_patch,
16611649
);
16621650
let preprocess_res = match preprocess_res {
16631651
Ok(preprocess_res) => {
@@ -1835,6 +1823,7 @@ impl Chain {
18351823
let should_save_state_transition_data =
18361824
self.should_produce_state_witness_for_this_or_next_epoch(block.header())?;
18371825
let epoch_to_check = self.protocol_version_check;
1826+
let sandbox_patch_gen = block_preprocess_info.sandbox_patch_generation;
18381827
let mut chain_update = self.chain_update();
18391828
let block_hash = *block.hash();
18401829
let new_head = chain_update.postprocess_block(
@@ -1847,6 +1836,7 @@ impl Chain {
18471836
chain_update.check_protocol_version(&block_hash, epoch_to_check)?;
18481837
}
18491838
chain_update.commit()?;
1839+
self.sandbox_patches.mark_committed(sandbox_patch_gen);
18501840
Ok(new_head)
18511841
}
18521842

@@ -2348,7 +2338,6 @@ impl Chain {
23482338
provenance: &Provenance,
23492339
invalid_chunks: &mut Vec<ShardChunkHeader>,
23502340
block_received_time: Instant,
2351-
state_patch: SandboxStatePatch,
23522341
) -> Result<PreprocessBlockResult, Error> {
23532342
let header = block.header();
23542343

@@ -2535,7 +2524,6 @@ impl Chain {
25352524
// for these states as well
25362525
// otherwise put the block into the permanent storage, waiting for be caught up
25372526
if is_caught_up { ApplyChunksMode::IsCaughtUp } else { ApplyChunksMode::NotCaughtUp },
2538-
state_patch,
25392527
invalid_chunks,
25402528
)?;
25412529

@@ -2550,6 +2538,7 @@ impl Chain {
25502538
provenance: provenance.clone(),
25512539
apply_chunks_done_waiter,
25522540
block_start_processing_time: block_received_time,
2541+
sandbox_patch_generation: self.sandbox_patches.generation_for_block(),
25532542
},
25542543
apply_chunks_still_applying,
25552544
))
@@ -2775,7 +2764,6 @@ impl Chain {
27752764
&prev_block,
27762765
&receipts_by_shard,
27772766
ApplyChunksMode::CatchingUp,
2778-
Default::default(),
27792767
&mut Vec::new(),
27802768
)?;
27812769
metrics::SCHEDULED_CATCHUP_BLOCK.set(block.header().height() as i64);
@@ -3200,7 +3188,6 @@ impl Chain {
32003188
prev_block: &Block,
32013189
incoming_receipts: &HashMap<ShardId, Vec<ReceiptProof>>,
32023190
mode: ApplyChunksMode,
3203-
mut state_patch: SandboxStatePatch,
32043191
invalid_chunks: &mut Vec<ShardChunkHeader>,
32053192
) -> Result<Vec<UpdateShardJob>, Error> {
32063193
let protocol_version =
@@ -3248,12 +3235,21 @@ impl Chain {
32483235
return Err(Error::BlockPendingOptimisticExecution);
32493236
}
32503237

3238+
let block_height = block.header().height();
32513239
for (shard_index, (block_context, cached_shard_update_key)) in
32523240
update_shard_args.into_iter().enumerate()
32533241
{
3254-
// XXX: This is a bit questionable -- sandbox state patching works
3255-
// only for a single shard. This so far has been enough.
3256-
let state_patch = state_patch.take();
3242+
// Take the sandbox patch from the tracker for this shard. Only
3243+
// shards with a new chunk receive the patch — old-chunk shards
3244+
// would silently drop it in the runtime. The CatchingUp mode
3245+
// never consumes sandbox patches.
3246+
let shard_has_new_chunk =
3247+
chunk_headers.get(shard_index).map_or(false, |h| h.is_new_chunk(block_height));
3248+
let state_patch = if mode != ApplyChunksMode::CatchingUp {
3249+
self.sandbox_patches.take_for_shard(shard_has_new_chunk)
3250+
} else {
3251+
SandboxStatePatch::default()
3252+
};
32573253

32583254
let shard_id = shard_layout.get_shard_id(shard_index)?;
32593255
let prev_chunk_header =
@@ -4030,14 +4026,12 @@ impl Chain {
40304026

40314027
/// Sandbox node specific operations
40324028
impl Chain {
4033-
// NB: `SandboxStatePatch` can only be created in `#[cfg(feature =
4034-
// "sandbox")]`, so we don't need extra cfg-gating here.
40354029
pub fn patch_state(&mut self, patch: SandboxStatePatch) {
4036-
self.pending_state_patch.merge(patch);
4030+
self.sandbox_patches.submit(patch);
40374031
}
40384032

40394033
pub fn patch_state_in_progress(&self) -> bool {
4040-
!self.pending_state_patch.is_empty()
4034+
self.sandbox_patches.in_progress()
40414035
}
40424036
}
40434037

core/primitives/src/sandbox.rs

Lines changed: 87 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,7 @@ pub mod state_patch {
55
/// Changes to the state to be applied via sandbox-only state patching
66
/// feature.
77
///
8-
/// As we only expose this functionality for sandbox, we make sure that the
9-
/// object can be non-empty only if `sandbox` feature is enabled. On
10-
/// non-sandbox build, this struct is ZST and its methods are essentially
11-
/// short-circuited by treating the type as always empty.
8+
/// On non-sandbox builds this struct is a ZST whose methods are no-ops.
129
#[derive(Default, Clone)]
1310
pub struct SandboxStatePatch {
1411
records: Vec<StateRecord>,
@@ -40,6 +37,69 @@ pub mod state_patch {
4037
self.records.into_iter()
4138
}
4239
}
40+
41+
/// Tracks sandbox state patches through the block processing pipeline.
42+
///
43+
/// Wraps `SandboxStatePatch` with a generation counter so the RPC can
44+
/// accurately report when a patch has been committed to the DB (not just
45+
/// consumed from the pending queue).
46+
#[derive(Default)]
47+
pub struct SandboxPatchTracker {
48+
pending: SandboxStatePatch,
49+
/// Incremented each time `submit()` is called with a non-empty patch.
50+
generation: u64,
51+
/// Advanced to match `generation` after the block carrying the patch
52+
/// is committed to the DB.
53+
committed_gen: u64,
54+
}
55+
56+
impl SandboxPatchTracker {
57+
/// Queue a state patch for inclusion in a future block.
58+
pub fn submit(&mut self, patch: SandboxStatePatch) {
59+
if patch.is_empty() {
60+
return;
61+
}
62+
self.pending.merge(patch);
63+
self.generation += 1;
64+
}
65+
66+
/// Whether there is a submitted patch that hasn't been committed yet.
67+
pub fn in_progress(&self) -> bool {
68+
self.generation != self.committed_gen
69+
}
70+
71+
/// Take the pending patch for a shard, but only if the shard has a new
72+
/// chunk. The first shard with a new chunk gets the entire patch;
73+
/// subsequent shards get an empty one.
74+
pub fn take_for_shard(&mut self, shard_has_new_chunk: bool) -> SandboxStatePatch {
75+
if shard_has_new_chunk && !self.pending.is_empty() {
76+
self.pending.take()
77+
} else {
78+
SandboxStatePatch::default()
79+
}
80+
}
81+
82+
/// Returns the generation to stamp on `BlockPreprocessInfo`.
83+
///
84+
/// If the pending queue was drained during this block's preprocessing,
85+
/// returns the current generation so that `mark_committed` will advance
86+
/// `committed_gen`. Otherwise returns a sentinel that won't advance.
87+
pub fn generation_for_block(&self) -> u64 {
88+
if self.pending.is_empty() && self.generation > self.committed_gen {
89+
self.generation
90+
} else {
91+
self.committed_gen
92+
}
93+
}
94+
95+
/// Called after `chain_update.commit()`. Advances `committed_gen` if
96+
/// the block carried a patch whose generation is newer.
97+
pub fn mark_committed(&mut self, generation: u64) {
98+
if generation > self.committed_gen {
99+
self.committed_gen = generation;
100+
}
101+
}
102+
}
43103
}
44104

45105
#[cfg(not(feature = "sandbox"))]
@@ -59,7 +119,7 @@ pub mod state_patch {
59119
Self
60120
}
61121
#[inline(always)]
62-
pub fn merge(&self, _other: SandboxStatePatch) {}
122+
pub fn merge(&mut self, _other: SandboxStatePatch) {}
63123
}
64124

65125
impl IntoIterator for SandboxStatePatch {
@@ -71,4 +131,26 @@ pub mod state_patch {
71131
std::iter::empty()
72132
}
73133
}
134+
135+
#[derive(Default)]
136+
pub struct SandboxPatchTracker;
137+
138+
impl SandboxPatchTracker {
139+
#[inline(always)]
140+
pub fn submit(&mut self, _patch: SandboxStatePatch) {}
141+
#[inline(always)]
142+
pub fn in_progress(&self) -> bool {
143+
false
144+
}
145+
#[inline(always)]
146+
pub fn take_for_shard(&mut self, _shard_has_new_chunk: bool) -> SandboxStatePatch {
147+
SandboxStatePatch
148+
}
149+
#[inline(always)]
150+
pub fn generation_for_block(&self) -> u64 {
151+
0
152+
}
153+
#[inline(always)]
154+
pub fn mark_committed(&mut self, _generation: u64) {}
155+
}
74156
}

0 commit comments

Comments
 (0)