Skip to content

Conversation

@will-dz
Copy link
Collaborator

@will-dz will-dz commented Dec 17, 2025

This pull request introduces several improvements and bug fixes to the task engine, focusing on concurrency safety, error handling, and developer ergonomics. The most significant changes include a fix for a data race in operator reconnection logic, enhanced error propagation and reporting throughout node execution, improved ABI parameter parsing with clearer error messages, and new tests to ensure thread safety under concurrent access. These updates increase the robustness and maintainability of the codebase.

Concurrency and Safety Improvements:

  • Fixed a data race in StreamCheckToOperator by ensuring the ticker cancellation is performed outside the lock, preventing unsafe concurrent access to trackSyncedTasks and improving operator reconnection reliability. [1] [2]
  • Added comprehensive concurrent access tests (engine_concurrent_access_test.go) to verify the thread safety of trackSyncedTasks and specifically test for race conditions during operator reconnection.

Error Handling and Propagation:

  • Improved error propagation in step finalization: structured errors with appropriate error codes are now created and propagated when a step fails, ensuring error messages and codes are consistently set in the response. [1] [2]
  • Updated extractExecutionResult and RunNodeImmediatelyRPC to preserve and propagate error codes from execution steps, ensuring clients receive accurate error information. This includes extracting error codes from results, handling multiple possible types, and ensuring the Success flag is set correctly. [1] [2] [3] [4]

ABI Parameter Parsing and Developer Ergonomics:

  • Enhanced ABI parameter parsing to provide more descriptive error messages, including parameter names and tuple field names, and stricter validation for numeric types. This makes debugging input errors much easier for developers and users. [1] [2] [3] [4]

Contract Read Node Execution:

  • Improved error handling in contract read node execution by ensuring error messages are not overwritten and are properly extracted and reported, leading to more accurate step results. [1] [2] [3]

Note

Fixes a data race in operator reconnection, propagates structured errors and codes end‑to‑end, strengthens ABI parameter parsing and messaging, and adds targeted concurrency and calldata validation tests.

  • Engine/Concurrency:
    • Resolve reconnection race in StreamCheckToOperator: capture oldTickerCancel under lock, update state, unlock, then cancel; always reset tracking on reconnect; add stabilization window and cleanup tweaks.
    • Add concurrency tests (engine_concurrent_access_test.go) covering simultaneous connections and reconnection race.
  • Error handling/Propagation:
    • finalizeStep: create structured errors on failure, set ErrorCode; ensure message/code set when success=false.
    • Preserve and surface errorCode via extractExecutionResult and RunNodeImmediatelyRPC; map connectivity failures to RPC_NODE_ERROR, default others to INVALID_REQUEST.
  • Contract nodes:
    • Contract Read: avoid deferred finalization overwriting messages; derive step success from first‑pass results; clearer per‑method error returns.
    • Contract Write: return raw error strings for calldata generation failures; rely on step success as single source of truth.
  • ABI/call data parsing:
    • Stricter numeric validation in parseABIParameter; improved GenerateCallData errors including param/tuple field names; better tuple element error context.
  • Tests:
    • Add calldata parsing/response structure tests (utils_calldata_test.go) asserting clear errors and INVALID_REQUEST codes for invalid numerics in reads/writes.

Written by Cursor Bugbot for commit fd3c825. Configure here.

…e timing

Fixed a critical data race condition in StreamCheckToOperator where the lock
was released prematurely inside the if block, leaving the else block to access
and modify n.trackSyncedTasks without proper synchronization.

Changes:
- Release lock once after both if/else branches complete
- Store oldTickerCancel while holding lock, then cancel after lock release
- Add comprehensive concurrent access tests with race detector enabled
- Tests verify both concurrent connections and reconnection race scenarios
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves error handling for contract read/write operations, particularly focusing on better error messages when invalid parameters are provided and fixing a concurrency issue in the streaming engine.

Key Changes:

  • Enhanced error messages for invalid numeric values in calldata generation (e.g., "MAX" instead of a number)
  • Simplified error formatting by using err.Error() directly instead of wrapping with fmt.Sprintf
  • Fixed a race condition in StreamCheckToOperator by moving ticker cancellation outside the lock
  • Added comprehensive tests for invalid parameter handling in contract operations

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/taskengine/vm_runner_contract_write.go Simplified error message formatting to use err.Error() directly
core/taskengine/vm_runner_contract_read.go Removed defer for finalizeStep and added multiple fallback paths for error extraction
core/taskengine/utils_calldata_test.go Added comprehensive tests for invalid numeric value handling in contract operations
core/taskengine/utils.go Enhanced error messages to include parameter/field names and improved numeric value validation
core/taskengine/run_node_immediately.go Added error code extraction and propagation from execution steps to RPC responses
core/taskengine/node_utils.go Enhanced finalizeStep to handle error message extraction and wrapping with proper error codes
core/taskengine/engine_concurrent_access_test.go Added concurrency tests for StreamCheckToOperator to verify the race condition fix
core/taskengine/engine.go Fixed race condition by moving ticker cancellation outside the critical section
Comments suppressed due to low confidence (1)

core/taskengine/vm_runner_contract_read.go:408

  • Removing the defer for finalizeStep creates a critical issue: there are multiple early return paths in this function (lines 408, 415, 419, 428, 434, 444, 448) that now bypass finalizeStep entirely. This means the execution step will be returned without:
  • EndAt timestamp being set
  • Success flag being set
  • Error being properly recorded in the step
  • Log content being attached

This will cause incomplete execution step data to be returned for validation errors. Either the defer should be restored, or finalizeStep must be called before every return statement in the function.

	var err error
	// Note: finalizeStep is called explicitly at the end with proper error message extraction
	// No need for defer here as it would overwrite the error message

	// Get configuration from node config
	if err = validateNodeConfig(node.Config, "ContractReadNode"); err != nil {
		log.WriteString(fmt.Sprintf("Error: %s\n", err.Error()))
		return s, err

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add finalizeStep calls to all early error returns in ContractRead Execute
- Remove redundant safeguards in RunNodeImmediatelyRPC that were setting success value
- Success value is now deterministically set only by finalizeStep, ensuring consistent behavior
- All error handling tests pass, confirming proper error propagation
…dling

The error messages are already captured by finalizeStep via err.Error(),
so writing them to the log buffer before finalizeStep is redundant.
- Remove overly defensive fallback logic in ContractRead success computation
  (methodResults, rawResultsForMetadata, and results all contain same error info)
- Clarify comment about abiType.T == 0 check (detects unspecified/zero-value, not incomplete types)
- The 'invalid request:' prefix is intentional and matches test expectations
@chrisli30 chrisli30 merged commit 3a66c36 into staging Dec 17, 2025
17 checks passed
@chrisli30 chrisli30 deleted the will-fix_contract_read_response branch December 17, 2025 23:02
chrisli30 pushed a commit that referenced this pull request Dec 20, 2025
…#437)

* fix: run_node error response not clear for contract_read node

* fix: run_node error response for contract_write node

* fix: prevent data race in StreamCheckToOperator by fixing lock release timing

Fixed a critical data race condition in StreamCheckToOperator where the lock
was released prematurely inside the if block, leaving the else block to access
and modify n.trackSyncedTasks without proper synchronization.

Changes:
- Release lock once after both if/else branches complete
- Store oldTickerCancel while holding lock, then cancel after lock release
- Add comprehensive concurrent access tests with race detector enabled
- Tests verify both concurrent connections and reconnection race scenarios

* fix: ensure finalizeStep is single source of truth for step success

- Add finalizeStep calls to all early error returns in ContractRead Execute
- Remove redundant safeguards in RunNodeImmediatelyRPC that were setting success value
- Success value is now deterministically set only by finalizeStep, ensuring consistent behavior
- All error handling tests pass, confirming proper error propagation

* fix: remove redundant log.WriteString calls in ContractRead error handling

The error messages are already captured by finalizeStep via err.Error(),
so writing them to the log buffer before finalizeStep is redundant.

* fix: simplify fallback logic and clarify comment per Copilot review

- Remove overly defensive fallback logic in ContractRead success computation
  (methodResults, rawResultsForMetadata, and results all contain same error info)
- Clarify comment about abiType.T == 0 check (detects unspecified/zero-value, not incomplete types)
- The 'invalid request:' prefix is intentional and matches test expectations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants