Skip to content

Commit 40d7357

Browse files
committed
action rewinding: recover lost CAS blobs
Remote CAS loss could leave deferred builds unable to recover generated inputs or final outputs without a daemon restart. This matters because the build graph still contains producer actions that can recreate those blobs, but Buck was not always turning missing-CAS errors into graph rewinds. The existing retry covered some upload and forced materialization failures, but missed default final materialization, local executor input materialization, and upload passes with several generated inputs missing at once. Track lost inputs as a typed batch so the build command can dirty all producer BuildKey nodes and the consumer in one rewind. Canonicalize rewind keys through the registered action lookup before dirtying DICE. This makes dynamic_output redirects invalidate the producer action that can recreate the missing blob. When a rewound action is replayed, bypass both Buck action-cache lookups and the remote executor cache lookup. Remote execution can otherwise return the same cached ActionResult, leaving the missing CAS blob absent and causing the consumer to hit the rewind cap. When local materialization discovers an expired CAS entry, convert the materializer not-found error into the same typed context. Also treat default final materialization not-found errors as rewindable, since materializations = deferred still materializes requested outputs unless the stricter skip-final mode is selected. When final materialization and final upload run together, upload can report a missing CAS blob before materialization cleans the shared queue. Because those branches run under try_compute2, the upload error can drop the materialization side before it removes queue_tracker entries. Clear those entries on the upload-side rewind path as well. Also clear the per-transaction materialization queue after committing a rewind, so the retry does not skip outputs that were queued before the DICE transaction was invalidated. The tests use Buck remote-execution test hooks and a hybrid execution platform instead of external RE configuration. They cover remote generated inputs, directory leaves, worker-side missing input reports, local-only consumers, default final materialization, final upload with materialization, and a missing-input count above the repeated-rewind cap.
1 parent 12cfc6e commit 40d7357

26 files changed

Lines changed: 1964 additions & 153 deletions

File tree

app/buck2_action_impl/src/actions/impls/run.rs

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1076,9 +1076,14 @@ impl RunAction {
10761076
// First, check in the local dep file cache if an identical action can be found there.
10771077
// Do this before checking the action cache as we can avoid a potentially large download.
10781078
// Once the action cache lookup misses, we will do the full dep file cache look up.
1079-
let (outputs, should_fully_check_dep_file_cache) = dep_file_bundle
1080-
.check_local_dep_file_cache_for_identical_action(ctx, self.outputs.as_slice())
1081-
.await?;
1079+
let should_bypass_action_cache = ctx.should_bypass_action_cache();
1080+
let (outputs, should_fully_check_dep_file_cache) = if should_bypass_action_cache {
1081+
(None, false)
1082+
} else {
1083+
dep_file_bundle
1084+
.check_local_dep_file_cache_for_identical_action(ctx, self.outputs.as_slice())
1085+
.await?
1086+
};
10821087
if let Some((outputs, metadata)) = outputs {
10831088
return Ok(ExecuteResult::LocalDepFileHit(outputs, metadata));
10841089
}
@@ -1541,7 +1546,8 @@ impl Action for RunAction {
15411546
waiting_data: WaitingData,
15421547
) -> Result<(ActionOutputs, ActionExecutionMetadata), ExecuteError> {
15431548
// Check offline cache first if parameter enabled
1544-
if self.inner.allow_offline_output_cache
1549+
if !ctx.should_bypass_action_cache()
1550+
&& self.inner.allow_offline_output_cache
15451551
&& ctx.run_action_knobs().use_network_action_output_cache
15461552
{
15471553
if let Some((outputs, metadata)) = self.execute_for_offline(ctx).await? {

app/buck2_build_api/src/actions.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ pub mod execute;
9797
pub mod impls;
9898
pub mod query;
9999
pub mod registry;
100+
pub mod rewind;
100101

101102
/// Represents an unregistered 'Action' that will be registered into the 'Actions' module.
102103
/// The 'UnregisteredAction' is not executable until it is registered, upon which it becomes an
@@ -286,6 +287,10 @@ pub trait ActionExecutionCtx: Send + Sync {
286287
prepared_action: &PreparedAction,
287288
) -> ControlFlow<CommandExecutionResult, CommandExecutionManager>;
288289

290+
fn should_bypass_action_cache(&self) -> bool {
291+
false
292+
}
293+
289294
async fn cache_upload(
290295
&mut self,
291296
action: &ActionDigestAndBlobs,

0 commit comments

Comments
 (0)