Commit 40d7357
committed
action rewinding: recover lost CAS blobs
Remote CAS loss could leave deferred builds unable to recover generated
inputs or final outputs without a daemon restart. This matters because
the build graph still contains producer actions that can recreate those
blobs, but Buck was not always turning missing-CAS errors into graph
rewinds. The existing retry covered some upload and forced
materialization failures, but missed default final materialization,
local executor input materialization, and upload passes with several
generated inputs missing at once.
Track lost inputs as a typed batch so the build command can dirty
all producer BuildKey nodes and the consumer in one rewind. Canonicalize
rewind keys through the registered action lookup before dirtying DICE.
This makes dynamic_output redirects invalidate the producer action that
can recreate the missing blob.
When a rewound action is replayed, bypass both Buck action-cache lookups
and the remote executor cache lookup. Remote execution can otherwise
return the same cached ActionResult, leaving the missing CAS blob absent
and causing the consumer to hit the rewind cap.
When local materialization discovers an expired CAS entry, convert the
materializer not-found error into the same typed context. Also treat
default final materialization not-found errors as rewindable, since
materializations = deferred still materializes requested outputs unless
the stricter skip-final mode is selected.
When final materialization and final upload run together, upload can
report a missing CAS blob before materialization cleans the shared
queue. Because those branches run under try_compute2, the upload error
can drop the materialization side before it removes queue_tracker
entries. Clear those entries on the upload-side rewind path as well.
Also clear the per-transaction materialization queue after committing a
rewind, so the retry does not skip outputs that were queued before the
DICE transaction was invalidated.
The tests use Buck remote-execution test hooks and a hybrid execution
platform instead of external RE configuration. They cover remote
generated inputs, directory leaves, worker-side missing input reports,
local-only consumers, default final materialization, final upload with
materialization, and a missing-input count above the repeated-rewind
cap.1 parent 12cfc6e commit 40d7357
26 files changed
Lines changed: 1964 additions & 153 deletions
File tree
- app
- buck2_action_impl/src/actions/impls
- buck2_build_api/src
- actions
- execute
- buck2_common/src/pattern
- buck2_core/src/pattern
- buck2_execute_impl/src/executors
- buck2_execute/src
- execute
- materialize
- re
- buck2_server_commands/src
- buck2_server/src
- buck2_test/src
- tests/core
- executor
- test_action_rewinding_data
- platforms
- help/test_help_env_data
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1076 | 1076 | | |
1077 | 1077 | | |
1078 | 1078 | | |
1079 | | - | |
1080 | | - | |
1081 | | - | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
1082 | 1087 | | |
1083 | 1088 | | |
1084 | 1089 | | |
| |||
1541 | 1546 | | |
1542 | 1547 | | |
1543 | 1548 | | |
1544 | | - | |
| 1549 | + | |
| 1550 | + | |
1545 | 1551 | | |
1546 | 1552 | | |
1547 | 1553 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| 100 | + | |
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
| |||
286 | 287 | | |
287 | 288 | | |
288 | 289 | | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
289 | 294 | | |
290 | 295 | | |
291 | 296 | | |
| |||
0 commit comments