Collator waits for external messages until timeout by SpyCheese · Pull Request #2180 · ton-blockchain/ton

SpyCheese · 2026-03-02T08:42:54Z

No description provided.

github-actions · 2026-03-02T09:17:54Z

High: duplicate callback messages can be incorrectly marked as bad and deleted from mempool.
validator/impl/ext-message-pool.cpp:198 allows reinserting the same hash with higher priority, and validator/impl/ext-message-pool.cpp:212 still invokes collator callbacks for it.
In collator, duplicates are treated as errors in validator/impl/collator.cpp:6614, and got_new_external_message pushes all such errors into bad_ext_msgs_ at validator/impl/collator.cpp:6663. Later, these hashes are sent as to_delete in validator/impl/collator.cpp:6549, so a valid message can be dropped as “bad” just because it was re-announced/upgraded.
Low: do_collate() moves the error out of result and then moves status again, so perf logging loses the real error status.
At validator/impl/collator.cpp:2386 result.move_as_error() is consumed by fatal_error, then validator/impl/collator.cpp:2388 calls result.move_as_status() again on already-moved state. This likely records a synthetic/internal status instead of the original failure reason.

github-actions · 2026-03-03T08:05:41Z

High: first-block error path can get stuck in an infinite retry loop without ever starting a new collation. In validator/consensus/block-producer.cpp:137, any r_candidate.is_error() for is_first_block does continue before clearing block_generation_active. For non-timeout errors, the shared future is already resolved with error, so next iteration immediately hits the same branch again forever.
Suggested fix: in that branch, only continue on timeout, or set block_generation_active = false before continue for non-timeout errors so a fresh collate_block request is started.

github-actions · 2026-03-03T09:54:28Z

High: potential busy-loop on first-slot collation failures in new retry path.
In validator/consensus/block-producer.cpp:144, on is_first_block error you do --slot, reset start_time, and continue without any backoff. If collate_block returns a deterministic non-timeout error quickly, this can spin retries very fast (log spam + manager pressure) until the leader window changes. Consider adding a minimal retry delay / capped retry rate for this branch.
Medium: callback-based external-message waiting is silently disabled in disk/hardfork managers.
validator/manager-disk.cpp:545 and validator/manager-hardfork.cpp:380 accept std::unique_ptr<ExtMsgCallback> callback but never store/use it. That means wait_externals_until behavior differs from main manager implementation and won’t receive post-fetch external messages in these modes. If this divergence is intentional, it should be explicit; otherwise wire callback delivery similarly to manager.cpp.

github-actions · 2026-03-04T08:21:14Z

High: await_with_timeout has two detached tasks racing to complete the same Promise without coordination.
SharedFuture.h:63, SharedFuture.h:65, SharedFuture.h:72, PromiseFuture.h:190
The timeout worker and task worker both call set_* on the same Promise object. There is no winner arbitration or synchronization, so behavior near the deadline is nondeterministic (timeout can win even if result is ready), and on a multi-threaded executor this is a data race on Promise::promise_.
Medium: callback path added to get_external_messages is ignored in manager-disk and manager-hardfork, so collator waiting for new externals cannot be woken in these modes.
manager-disk.cpp:545, manager-hardfork.cpp:380, collator.cpp:4300
When wait_externals_until is enabled, these manager implementations return initial messages but drop the live callback, so wait_for_external_message(...) only exits on timeout. This can add avoidable per-slot delay in those environments.

SpyCheese added 3 commits February 26, 2026 12:47

Changes in block producer

b457a05

Collator waits for external messages until timeout

8de6050

Merge branch 'testnet' into simplex-work

a2289ae

SpyCheese added 2 commits March 3, 2026 10:48

Fix use after move

e022fea

Fix processing duplicate ext msgs

b219670

Fix error processing on first block in session

6a761d1

Wait a bit if collation of the first block fails

1f31ed9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collator waits for external messages until timeout#2180

Collator waits for external messages until timeout#2180
SpyCheese wants to merge 7 commits intotestnetfrom
simplex-work

SpyCheese commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SpyCheese commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant