Improve server-client communication error handling#6578
Improve server-client communication error handling#6578oliver-sanders merged 7 commits intocylc:8.4.xfrom
Conversation
5dd62b7 to
ee3102d
Compare
ee3102d to
ff80511
Compare
| client = WorkflowRuntimeClient(one.workflow) | ||
| with monkeypatch.context() as mp: | ||
| mp.setattr(client, 'socket', Mock(recv=mock_recv)) | ||
| mp.setattr(client, 'poller', Mock()) |
There was a problem hiding this comment.
For future/historical reference, I originally tried mocking the poll method of client.poller, i.e.
mp.setattr(client.poller, 'poll', Mock())but this resulted in pytest hanging near the end of running all integration tests; some strange interaction due to the ZMQ poller using threading or something, dunno.
Anyway, the reason for this monkeypatching is to get client.poller.poll() to return a truthy value so the socket can receive the mock response
cylc-flow/cylc/flow/network/client.py
Lines 298 to 300 in dd468d6
wxtim
left a comment
There was a problem hiding this comment.
I've made two suggestions, but neither should block this PR. :)
There was a problem hiding this comment.
Can you help me understand what bug this fixes.
The changes I can see:
- messages/traceback field removed
- cylc_version field added
- Errors now force exceptions in multi mode (was this the bug?).
- Refactoring.
- Nicer error for the trigger incompatibility issue.
Before# Workflow running at Cylc 8.3.6, CLI at Cylc 8.4.0
$ cylc trigger wflow//1/foo
Error processing command:
AttributeError: 'list' object has no attribute 'values'After# Workflow running at Cylc 8.3.6, CLI at this Cylc <this PR>
$ cylc trigger wflow//1/foo
Error processing command:
Exception: This command is no longer compatible with the version of Cylc running the workflow. Please stop & restart the workflow with Cylc 8.4.1.dev or higher.
[{'error': {'message': 'Unknown argument "onResume" on field "trigger" of type "Mutations".', 'traceback': ['graphql.error.base.GraphQLError: Unknown argument "onResume" on field "trigger" of type "Mutations".\n']}}](Added to OP) |
0fac2c7 to
f751949
Compare
b81932c to
9505212
Compare
9505212 to
43cab17
Compare
|
I think this PR is stripping away exception context? I tried creating an internal error with this diff: diff --git a/cylc/flow/network/resolvers.py b/cylc/flow/network/resolvers.py
index 79b91f97e..702b3f573 100644
--- a/cylc/flow/network/resolvers.py
+++ b/cylc/flow/network/resolvers.py
@@ -787,6 +787,7 @@ class Resolvers(BaseResolvers):
cutoff: Any = None
):
"""Put or clear broadcasts."""
+ raise ValueError('Unsupported broadcast mode')
if settings is not None:
# Convert schema field names to workflow config setting names if
# applicable:Results: |
aa12785 to
1415531
Compare
|
Should be sorted in 1415531 |
| } | ||
| but this is not 100% consistent unfortunately | ||
| """ | ||
| error: Union[Exception, str, dict] |
There was a problem hiding this comment.
When is error an Exception or a str?
There was a problem hiding this comment.
I think I tried to change it to Exception | str to begin with, to simplify things, but realised it breaks inter-version compatibility. So presently I don't think it ever is an Exception or str. However, with these changes we should be able to change it to be Exception | str in future without breaking inter-version compatibility with this version onwards.
|
Using a diff along the lines of this: diff --git a/cylc/flow/network/resolvers.py b/cylc/flow/network/resolvers.py
index 79b91f97e..702b3f573 100644
--- a/cylc/flow/network/resolvers.py
+++ b/cylc/flow/network/resolvers.py
@@ -787,6 +787,7 @@ class Resolvers(BaseResolvers):
cutoff: Any = None
):
"""Put or clear broadcasts."""
+ raise ValueError('Unsupported broadcast mode')
if settings is not None:
# Convert schema field names to workflow config setting names if
# applicable:We will now get an error something like this:
This patch changes the string diff --git a/cylc/flow/exceptions.py b/cylc/flow/exceptions.py
index 802cfaaa9..a3eff7388 100644
--- a/cylc/flow/exceptions.py
+++ b/cylc/flow/exceptions.py
@@ -285,6 +285,10 @@ class ClientError(CylcError):
return ret
+class RequestError(ClientError):
+ """Represents an error returned by the server."""
+
+
class WorkflowStopped(ClientError):
"""The Cylc scheduler you attempted to connect to is stopped."""diff --git a/cylc/flow/network/client.py b/cylc/flow/network/client.py
index 656aefd5c..89dac4cbb 100644
--- a/cylc/flow/network/client.py
+++ b/cylc/flow/network/client.py
@@ -40,10 +40,10 @@ from cylc.flow import (
__version__ as CYLC_VERSION,
)
from cylc.flow.exceptions import (
- ClientError,
ClientTimeout,
ContactFileExists,
CylcError,
+ RequestError,
WorkflowStopped,
)
from cylc.flow.hostuserutil import get_fqdn_by_host
@@ -332,17 +332,23 @@ class WorkflowRuntimeClient( # type: ignore[misc]
return response['data']
except KeyError:
error = response.get('error')
- if not error:
- error = (
+ error_mesage: str
+ if isinstance(error, (str, Exception)):
+ error_mesage = str(error)
+ elif error is None:
+ error_mesage = (
f"Received invalid response for Cylc {CYLC_VERSION}: "
f"{response}"
)
wflow_cylc_ver = response.get('cylc_version')
if wflow_cylc_ver and wflow_cylc_ver != CYLC_VERSION:
- error += (
+ error_mesage += (
f"\n(Workflow is running in Cylc {wflow_cylc_ver})"
)
- raise ClientError(str(error)) from None
+ else:
+ error_mesage = error.get('message', 'ERROR')
+
+ raise RequestError(error_mesage) from None
def get_header(self) -> dict:
"""Return "header" data to attach to each request for traceability. |
…n_cylc_show * upstream/master: (27 commits) tui: open log files in external application (cylc#6611) fix typo in stop.py - options.max_poll -> options.max_polls schema: add first-parent descendants (cylc#6610) doc: improve NamespaceIDGlob description (cylc#6637) Document that "ex" prefix means "exclude" [skip ci] tests: remove string templating in SQL statements (cylc#6631) Merge pull request cylc#6629 from wxtim/tests.flake8 Remove vestiges of authorisation layer removed in cylc#3845 Bump dev version Prepare release 8.4.1 Merge pull request cylc#6578 from MetRonnie/graphql-err-handling Add test Update cylc/flow/etc/examples/expiry/index.rst Correct type annotations Fix duplicate task triggers Add test for duplicate task triggers Pytest: full verbosity for assertions Fix test picking up user global config Wrapper script: fix `PATH` override preventing selection of Cylc version in GUI under Cylc Hub Get poll to return task failure if job/log has been removed. (cylc#6577) ...
* upstream/8.4.x: Ensure `cylc message` exceptions are printed to stderr (cylc#6647) fix typo in stop.py - options.max_poll -> options.max_polls doc: improve NamespaceIDGlob description (cylc#6637) Document that "ex" prefix means "exclude" [skip ci] tests: remove string templating in SQL statements (cylc#6631) Bump dev version Prepare release 8.4.1 Merge pull request cylc#6578 from MetRonnie/graphql-err-handling Update cylc/flow/etc/examples/expiry/index.rst Pytest: full verbosity for assertions Fix test picking up user global config Wrapper script: fix `PATH` override preventing selection of Cylc version in GUI under Cylc Hub Get poll to return task failure if job/log has been removed. (cylc#6577) work around NFS caching issues (cylc#6603) allow workflow config to unset global events config (cylc#6518) examples: document expiry workflow design patterns
* upstream/8.4.x: Ensure `cylc message` exceptions are printed to stderr (cylc#6647) fix typo in stop.py - options.max_poll -> options.max_polls doc: improve NamespaceIDGlob description (cylc#6637) Document that "ex" prefix means "exclude" tests: remove string templating in SQL statements (cylc#6631) Bump dev version Prepare release 8.4.1 Merge pull request cylc#6578 from MetRonnie/graphql-err-handling Update cylc/flow/etc/examples/expiry/index.rst Pytest: full verbosity for assertions Fix test picking up user global config Wrapper script: fix `PATH` override preventing selection of Cylc version in GUI under Cylc Hub Get poll to return task failure if job/log has been removed. (cylc#6577) work around NFS caching issues (cylc#6603) allow workflow config to unset global events config (cylc#6518) examples: document expiry workflow design patterns
In #6499 we added a new field to the
triggerGraphQL mutation, and this unfortunately broke the ability forcylc triggerat 8.4 to work on workflows running in 8.3 (I shall term this "inter-version server-client comms").This PR does not fix that (we don't think a fix is feasible at this stage).
An additional problem is that the error handling for server-client comms is broken - that is what this PR fixes. And it attempts to ensure we have defined format for server-client comms which should reduce the chance of breaking inter-version server-client comms in future.
Before
After
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.?.?.xbranch.