fix: run_node error response not clear for contract_read node problem #437

will-dz · 2025-12-17T22:29:25Z

This pull request introduces several improvements and bug fixes to the task engine, focusing on concurrency safety, error handling, and developer ergonomics. The most significant changes include a fix for a data race in operator reconnection logic, enhanced error propagation and reporting throughout node execution, improved ABI parameter parsing with clearer error messages, and new tests to ensure thread safety under concurrent access. These updates increase the robustness and maintainability of the codebase.

Concurrency and Safety Improvements:

Fixed a data race in StreamCheckToOperator by ensuring the ticker cancellation is performed outside the lock, preventing unsafe concurrent access to trackSyncedTasks and improving operator reconnection reliability. [1] [2]
Added comprehensive concurrent access tests (engine_concurrent_access_test.go) to verify the thread safety of trackSyncedTasks and specifically test for race conditions during operator reconnection.

Error Handling and Propagation:

Improved error propagation in step finalization: structured errors with appropriate error codes are now created and propagated when a step fails, ensuring error messages and codes are consistently set in the response. [1] [2]
Updated extractExecutionResult and RunNodeImmediatelyRPC to preserve and propagate error codes from execution steps, ensuring clients receive accurate error information. This includes extracting error codes from results, handling multiple possible types, and ensuring the Success flag is set correctly. [1] [2] [3] [4]

ABI Parameter Parsing and Developer Ergonomics:

Enhanced ABI parameter parsing to provide more descriptive error messages, including parameter names and tuple field names, and stricter validation for numeric types. This makes debugging input errors much easier for developers and users. [1] [2] [3] [4]

Contract Read Node Execution:

Improved error handling in contract read node execution by ensuring error messages are not overwritten and are properly extracted and reported, leading to more accurate step results. [1] [2] [3]

Note

Fixes a data race in operator reconnection, propagates structured errors and codes end‑to‑end, strengthens ABI parameter parsing and messaging, and adds targeted concurrency and calldata validation tests.

Engine/Concurrency:
- Resolve reconnection race in StreamCheckToOperator: capture oldTickerCancel under lock, update state, unlock, then cancel; always reset tracking on reconnect; add stabilization window and cleanup tweaks.
- Add concurrency tests (engine_concurrent_access_test.go) covering simultaneous connections and reconnection race.
Error handling/Propagation:
- finalizeStep: create structured errors on failure, set ErrorCode; ensure message/code set when success=false.
- Preserve and surface errorCode via extractExecutionResult and RunNodeImmediatelyRPC; map connectivity failures to RPC_NODE_ERROR, default others to INVALID_REQUEST.
Contract nodes:
- Contract Read: avoid deferred finalization overwriting messages; derive step success from first‑pass results; clearer per‑method error returns.
- Contract Write: return raw error strings for calldata generation failures; rely on step success as single source of truth.
ABI/call data parsing:
- Stricter numeric validation in parseABIParameter; improved GenerateCallData errors including param/tuple field names; better tuple element error context.
Tests:
- Add calldata parsing/response structure tests (utils_calldata_test.go) asserting clear errors and INVALID_REQUEST codes for invalid numerics in reads/writes.

^{Written by Cursor Bugbot for commit fd3c825. Configure here.}

…e timing Fixed a critical data race condition in StreamCheckToOperator where the lock was released prematurely inside the if block, leaving the else block to access and modify n.trackSyncedTasks without proper synchronization. Changes: - Release lock once after both if/else branches complete - Store oldTickerCancel while holding lock, then cancel after lock release - Add comprehensive concurrent access tests with race detector enabled - Tests verify both concurrent connections and reconnection race scenarios

Copilot

Pull request overview

This PR improves error handling for contract read/write operations, particularly focusing on better error messages when invalid parameters are provided and fixing a concurrency issue in the streaming engine.

Key Changes:

Enhanced error messages for invalid numeric values in calldata generation (e.g., "MAX" instead of a number)
Simplified error formatting by using err.Error() directly instead of wrapping with fmt.Sprintf
Fixed a race condition in StreamCheckToOperator by moving ticker cancellation outside the lock
Added comprehensive tests for invalid parameter handling in contract operations

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
core/taskengine/vm_runner_contract_write.go	Simplified error message formatting to use `err.Error()` directly
core/taskengine/vm_runner_contract_read.go	Removed defer for finalizeStep and added multiple fallback paths for error extraction
core/taskengine/utils_calldata_test.go	Added comprehensive tests for invalid numeric value handling in contract operations
core/taskengine/utils.go	Enhanced error messages to include parameter/field names and improved numeric value validation
core/taskengine/run_node_immediately.go	Added error code extraction and propagation from execution steps to RPC responses
core/taskengine/node_utils.go	Enhanced finalizeStep to handle error message extraction and wrapping with proper error codes
core/taskengine/engine_concurrent_access_test.go	Added concurrency tests for StreamCheckToOperator to verify the race condition fix
core/taskengine/engine.go	Fixed race condition by moving ticker cancellation outside the critical section

Comments suppressed due to low confidence (1)

core/taskengine/vm_runner_contract_read.go:408

Removing the defer for finalizeStep creates a critical issue: there are multiple early return paths in this function (lines 408, 415, 419, 428, 434, 444, 448) that now bypass finalizeStep entirely. This means the execution step will be returned without:

EndAt timestamp being set
Success flag being set
Error being properly recorded in the step
Log content being attached

This will cause incomplete execution step data to be returned for validation errors. Either the defer should be restored, or finalizeStep must be called before every return statement in the function.

	var err error
	// Note: finalizeStep is called explicitly at the end with proper error message extraction
	// No need for defer here as it would overwrite the error message

	// Get configuration from node config
	if err = validateNodeConfig(node.Config, "ContractReadNode"); err != nil {
		log.WriteString(fmt.Sprintf("Error: %s\n", err.Error()))
		return s, err

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

core/taskengine/vm_runner_contract_read.go

core/taskengine/node_utils.go

core/taskengine/utils.go

- Add finalizeStep calls to all early error returns in ContractRead Execute - Remove redundant safeguards in RunNodeImmediatelyRPC that were setting success value - Success value is now deterministically set only by finalizeStep, ensuring consistent behavior - All error handling tests pass, confirming proper error propagation

…dling The error messages are already captured by finalizeStep via err.Error(), so writing them to the log buffer before finalizeStep is redundant.

- Remove overly defensive fallback logic in ContractRead success computation (methodResults, rawResultsForMetadata, and results all contain same error info) - Clarify comment about abiType.T == 0 check (detects unspecified/zero-value, not incomplete types) - The 'invalid request:' prefix is intentional and matches test expectations

…#437) * fix: run_node error response not clear for contract_read node * fix: run_node error response for contract_write node * fix: prevent data race in StreamCheckToOperator by fixing lock release timing Fixed a critical data race condition in StreamCheckToOperator where the lock was released prematurely inside the if block, leaving the else block to access and modify n.trackSyncedTasks without proper synchronization. Changes: - Release lock once after both if/else branches complete - Store oldTickerCancel while holding lock, then cancel after lock release - Add comprehensive concurrent access tests with race detector enabled - Tests verify both concurrent connections and reconnection race scenarios * fix: ensure finalizeStep is single source of truth for step success - Add finalizeStep calls to all early error returns in ContractRead Execute - Remove redundant safeguards in RunNodeImmediatelyRPC that were setting success value - Success value is now deterministically set only by finalizeStep, ensuring consistent behavior - All error handling tests pass, confirming proper error propagation * fix: remove redundant log.WriteString calls in ContractRead error handling The error messages are already captured by finalizeStep via err.Error(), so writing them to the log buffer before finalizeStep is redundant. * fix: simplify fallback logic and clarify comment per Copilot review - Remove overly defensive fallback logic in ContractRead success computation (methodResults, rawResultsForMetadata, and results all contain same error info) - Clarify comment about abiType.T == 0 check (detects unspecified/zero-value, not incomplete types) - The 'invalid request:' prefix is intentional and matches test expectations

will-dz added 3 commits December 17, 2025 14:11

fix: run_node error response not clear for contract_read node

b1e540f

fix: run_node error response for contract_write node

c3c123a

will-dz temporarily deployed to Test December 17, 2025 22:29 — with GitHub Actions Inactive

will-dz had a problem deploying to Test December 17, 2025 22:29 — with GitHub Actions Failure

will-dz temporarily deployed to Test December 17, 2025 22:29 — with GitHub Actions Inactive

chrisli30 requested a review from Copilot December 17, 2025 22:29

chrisli30 changed the title ~~Will fix contract read response~~ fix: run_node error response not clear for contract_read node problem Dec 17, 2025

Copilot started reviewing on behalf of chrisli30 December 17, 2025 22:30 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

core/taskengine/vm_runner_contract_read.go Outdated Show resolved Hide resolved

core/taskengine/node_utils.go Show resolved Hide resolved

core/taskengine/utils.go Outdated Show resolved Hide resolved

chrisli30 temporarily deployed to Test December 17, 2025 22:48 — with GitHub Actions Inactive

fix: remove redundant log.WriteString calls in ContractRead error han…

294f4ec

…dling The error messages are already captured by finalizeStep via err.Error(), so writing them to the log buffer before finalizeStep is redundant.

chrisli30 temporarily deployed to Test December 17, 2025 22:52 — with GitHub Actions Inactive

chrisli30 temporarily deployed to Test December 17, 2025 22:53 — with GitHub Actions Inactive

chrisli30 merged commit 3a66c36 into staging Dec 17, 2025
17 checks passed

chrisli30 deleted the will-fix_contract_read_response branch December 17, 2025 23:02

chrisli30 mentioned this pull request Dec 20, 2025

feat: switch cancel task to active/inactive tasks; add SetTaskActive; #438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: run_node error response not clear for contract_read node problem #437

fix: run_node error response not clear for contract_read node problem #437

Uh oh!

will-dz commented Dec 17, 2025 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: run_node error response not clear for contract_read node problem #437

fix: run_node error response not clear for contract_read node problem #437

Uh oh!

Conversation

will-dz commented Dec 17, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

will-dz commented Dec 17, 2025 •

edited by cursor bot

Loading