Skip to content

Conversation

@SpyCheese
Copy link
Member

No description provided.

@github-actions
Copy link

Blocking Issues

  • validator/import-db-slice-local.cpp:551-560 – The new pipelined path now calls Db::add_handle_to_archive (which ultimately writes a handle into the archive index and flips the handle_moved_to_archive flag) before we know that the block will actually be applied (apply_block_async_3 / ValidatorManager::new_block). If new_block later rejects the block (bad state, timeout, etc.), apply_blocks_async propagates the failure and the importer exits, but at that point the archive already contains the new handle even though the block was never applied or fully archived. On the next retry the handle will be seen as “already moved”, so BlockArchiver will skip the add‑handle step and you’re left with an inconsistent archive (no files, moved_to_archive == false, but the LT index already advanced). Previously the add‑handle happened inside BlockArchiver, i.e. only after new_block triggered archiving and only if that succeeded. We now need either to defer add_handle_to_archive until after new_block completes, or to implement a rollback for the handle/lt‑db entry when later phases fail; otherwise a transient failure strands half‑archived handles and the import can’t be retried cleanly.

  • validator/import-db-slice-local.cpp:99-104process_package now silently swallows any FileReference::create error and just return true;. Before this change we set S = F.move_as_error() and aborted the package, so the caller logged which file name was malformed/corrupted. With the new behaviour a broken entry (e.g. truncated proof filename) is skipped without any warning, the package reports success, and the importer only fails much later with a generic “no block data/proof” error – or worse, if that file was optional, the block is silently missing. At minimum we should keep surfacing the parsing error so operators know the archive on disk is corrupted; silently ignoring unknown file names makes diagnosis of archive issues almost impossible.

@github-actions
Copy link

No blocking bugs jumped out from this diff. The refactoring is large but the new coroutine‑based importer, async archiving changes, and option plumbing all look internally consistent, and I didn’t spot a correctness regression in the code paths I inspected.

Residual risks / things to double‑check:

  • validator/import-db-slice-local.cpp:551apply_block_async_2 explicitly states that add_handle must be performed in order for LTD index updates to stay monotonic, but the current implementation launches every Db::add_handle_to_archive through td::actor::all with no serialization. If the actor scheduler ever reorders those asks, older seqnos could be treated as duplicates and skipped inside ArchiveSlice::add_handle.
  • validator/import-db-slice-local.cpp pipeline still calls ValidatorManager::new_block, which unconditionally runs Db::archive. Because the importer now pre‑archives each handle via Db::add_handle_to_archive, BlockArchiver still replays the entire proof/data copying step, defeating the “Optimize importing archive packages” goal and potentially doubling the I/O for each block. Consider short‑circuiting ValidatorManager::new_block when the handle is already archived to avoid undoing the optimization.

@SpyCheese SpyCheese marked this pull request as draft December 15, 2025 14:58
@github-actions
Copy link

  • validator/import-db-slice-local.cpp: the new coroutine flow drops every td::Result returned by the async helpers. run_inner() co_awaits process_masterchain_blocks, process_shard_blocks, store_data, and apply_blocks without unwrapping the result, so any error from these steps is ignored and the importer will still report success. Use TRY_STATUS/TRY_RESULT (or co_await ... .wrap() and check) to propagate failures instead of discarding them.
  • validator/import-db-slice-local.cpp: try_advance_shard_client_seqno() now returns td::actor::Task<bool>, but process_shard_blocks() loops with while (co_await try_advance_shard_client_seqno()). Task<T>::await_resume yields td::Result<T>, not T, so the loop condition never inspects the returned bool (and may not even compile). Even if it compiles, it will treat any OK result as true and can spin forever when the function returns false. The result needs to be unwrapped (TRY_RESULT(bool advanced, co_await …)) and the contained bool should drive the loop.

@github-actions
Copy link

No blocking correctness issues jumped out while reviewing the diff.

  • The only behavioral change that feels risky is in validator/import-db-slice-local.cpp: process_package now ignores FileReference::create failures (previously it treated them as fatal), so archives with malformed filenames or corrupted entries will now be accepted and the offending files silently skipped. That could hide broken packages and lead to partial imports without an error. If the intent wasn’t to relax validation, consider keeping a failure path (or at least log the skipped entries).

If you want, I can add a small guard/log around that spot to keep corrupted archives from being treated as successful imports.

@github-actions
Copy link

  • validator/import-db-slice-local.cpp:561-577: in the permanent celldb path we now apply blocks via apply_blocks_async_*. store_data() writes block data while the handle isn’t marked as moved-to-archive, so ArchiveManager::add_file stores it in temp packages and never sets moved_to_archive. apply_blocks_async_2 later only calls add_handle_to_archive, which sets the handle-moved flag but still doesn’t set moved_to_archive. By the time apply_block_async_4 runs, moved_to_archive() remains false for freshly imported blocks, so the new CHECK(handle->moved_to_archive()) will trip and abort every import when permanent_celldb is enabled. You likely need to move the data to the final archive (or set the flag) before this check or drop the assertion.

@github-actions
Copy link

  • validator/import-db-slice-local.cpp:489-519 (permanent celldb path) applies all masterchain/shard blocks in parallel. The tasks_3 batch calls ValidatorManager::new_block for every handle concurrently, so masterchain blocks can reach the manager out of order. The manager expects strict sequencing (CHECK(seqno == last_masterchain_seqno_ + 1) in validator/manager-disk.cpp:830-840) and will crash or corrupt state if a later masterchain block is delivered first. Masterchain blocks need to be fed to new_block sequentially (or otherwise ordered) when importing in permanent mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants