-
Notifications
You must be signed in to change notification settings - Fork 75
fix: run_node error response not clear for contract_read node problem #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e timing Fixed a critical data race condition in StreamCheckToOperator where the lock was released prematurely inside the if block, leaving the else block to access and modify n.trackSyncedTasks without proper synchronization. Changes: - Release lock once after both if/else branches complete - Store oldTickerCancel while holding lock, then cancel after lock release - Add comprehensive concurrent access tests with race detector enabled - Tests verify both concurrent connections and reconnection race scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves error handling for contract read/write operations, particularly focusing on better error messages when invalid parameters are provided and fixing a concurrency issue in the streaming engine.
Key Changes:
- Enhanced error messages for invalid numeric values in calldata generation (e.g., "MAX" instead of a number)
- Simplified error formatting by using
err.Error()directly instead of wrapping withfmt.Sprintf - Fixed a race condition in
StreamCheckToOperatorby moving ticker cancellation outside the lock - Added comprehensive tests for invalid parameter handling in contract operations
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| core/taskengine/vm_runner_contract_write.go | Simplified error message formatting to use err.Error() directly |
| core/taskengine/vm_runner_contract_read.go | Removed defer for finalizeStep and added multiple fallback paths for error extraction |
| core/taskengine/utils_calldata_test.go | Added comprehensive tests for invalid numeric value handling in contract operations |
| core/taskengine/utils.go | Enhanced error messages to include parameter/field names and improved numeric value validation |
| core/taskengine/run_node_immediately.go | Added error code extraction and propagation from execution steps to RPC responses |
| core/taskengine/node_utils.go | Enhanced finalizeStep to handle error message extraction and wrapping with proper error codes |
| core/taskengine/engine_concurrent_access_test.go | Added concurrency tests for StreamCheckToOperator to verify the race condition fix |
| core/taskengine/engine.go | Fixed race condition by moving ticker cancellation outside the critical section |
Comments suppressed due to low confidence (1)
core/taskengine/vm_runner_contract_read.go:408
- Removing the defer for finalizeStep creates a critical issue: there are multiple early return paths in this function (lines 408, 415, 419, 428, 434, 444, 448) that now bypass finalizeStep entirely. This means the execution step will be returned without:
- EndAt timestamp being set
- Success flag being set
- Error being properly recorded in the step
- Log content being attached
This will cause incomplete execution step data to be returned for validation errors. Either the defer should be restored, or finalizeStep must be called before every return statement in the function.
var err error
// Note: finalizeStep is called explicitly at the end with proper error message extraction
// No need for defer here as it would overwrite the error message
// Get configuration from node config
if err = validateNodeConfig(node.Config, "ContractReadNode"); err != nil {
log.WriteString(fmt.Sprintf("Error: %s\n", err.Error()))
return s, err
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add finalizeStep calls to all early error returns in ContractRead Execute - Remove redundant safeguards in RunNodeImmediatelyRPC that were setting success value - Success value is now deterministically set only by finalizeStep, ensuring consistent behavior - All error handling tests pass, confirming proper error propagation
…dling The error messages are already captured by finalizeStep via err.Error(), so writing them to the log buffer before finalizeStep is redundant.
- Remove overly defensive fallback logic in ContractRead success computation (methodResults, rawResultsForMetadata, and results all contain same error info) - Clarify comment about abiType.T == 0 check (detects unspecified/zero-value, not incomplete types) - The 'invalid request:' prefix is intentional and matches test expectations
…#437) * fix: run_node error response not clear for contract_read node * fix: run_node error response for contract_write node * fix: prevent data race in StreamCheckToOperator by fixing lock release timing Fixed a critical data race condition in StreamCheckToOperator where the lock was released prematurely inside the if block, leaving the else block to access and modify n.trackSyncedTasks without proper synchronization. Changes: - Release lock once after both if/else branches complete - Store oldTickerCancel while holding lock, then cancel after lock release - Add comprehensive concurrent access tests with race detector enabled - Tests verify both concurrent connections and reconnection race scenarios * fix: ensure finalizeStep is single source of truth for step success - Add finalizeStep calls to all early error returns in ContractRead Execute - Remove redundant safeguards in RunNodeImmediatelyRPC that were setting success value - Success value is now deterministically set only by finalizeStep, ensuring consistent behavior - All error handling tests pass, confirming proper error propagation * fix: remove redundant log.WriteString calls in ContractRead error handling The error messages are already captured by finalizeStep via err.Error(), so writing them to the log buffer before finalizeStep is redundant. * fix: simplify fallback logic and clarify comment per Copilot review - Remove overly defensive fallback logic in ContractRead success computation (methodResults, rawResultsForMetadata, and results all contain same error info) - Clarify comment about abiType.T == 0 check (detects unspecified/zero-value, not incomplete types) - The 'invalid request:' prefix is intentional and matches test expectations
This pull request introduces several improvements and bug fixes to the task engine, focusing on concurrency safety, error handling, and developer ergonomics. The most significant changes include a fix for a data race in operator reconnection logic, enhanced error propagation and reporting throughout node execution, improved ABI parameter parsing with clearer error messages, and new tests to ensure thread safety under concurrent access. These updates increase the robustness and maintainability of the codebase.
Concurrency and Safety Improvements:
StreamCheckToOperatorby ensuring the ticker cancellation is performed outside the lock, preventing unsafe concurrent access totrackSyncedTasksand improving operator reconnection reliability. [1] [2]engine_concurrent_access_test.go) to verify the thread safety oftrackSyncedTasksand specifically test for race conditions during operator reconnection.Error Handling and Propagation:
extractExecutionResultandRunNodeImmediatelyRPCto preserve and propagate error codes from execution steps, ensuring clients receive accurate error information. This includes extracting error codes from results, handling multiple possible types, and ensuring theSuccessflag is set correctly. [1] [2] [3] [4]ABI Parameter Parsing and Developer Ergonomics:
Contract Read Node Execution:
Note
Fixes a data race in operator reconnection, propagates structured errors and codes end‑to‑end, strengthens ABI parameter parsing and messaging, and adds targeted concurrency and calldata validation tests.
StreamCheckToOperator: captureoldTickerCancelunder lock, update state, unlock, then cancel; always reset tracking on reconnect; add stabilization window and cleanup tweaks.engine_concurrent_access_test.go) covering simultaneous connections and reconnection race.finalizeStep: create structured errors on failure, setErrorCode; ensure message/code set whensuccess=false.errorCodeviaextractExecutionResultandRunNodeImmediatelyRPC; map connectivity failures toRPC_NODE_ERROR, default others toINVALID_REQUEST.parseABIParameter; improvedGenerateCallDataerrors including param/tuple field names; better tuple element error context.utils_calldata_test.go) asserting clear errors andINVALID_REQUESTcodes for invalid numerics in reads/writes.Written by Cursor Bugbot for commit fd3c825. Configure here.