Skip to content

fix no cfg xtrig broadcast delta shutdown#6936

Closed
dwsutherland wants to merge 2 commits intocylc:8.5.xfrom
dwsutherland:no-xtrig-cfg-bc-error-fix
Closed

fix no cfg xtrig broadcast delta shutdown#6936
dwsutherland wants to merge 2 commits intocylc:8.5.xfrom
dwsutherland:no-xtrig-cfg-bc-error-fix

Conversation

@dwsutherland
Copy link
Copy Markdown
Member

@dwsutherland dwsutherland commented Aug 25, 2025

So this occurred on workflow reload after xtrigger removal:

2025-08-24T09:54:16Z CRITICAL - An uncaught error caused Cylc to shut down.
    If you think this was an issue in Cylc, please report the following traceback to the developers.
    https://github.com/cylc/cylc-flow/issues/new?assignees=&labels=bug&template=bug.md&title=;
2025-08-24T09:54:16Z ERROR - 'cvt_nzlam_ps38_stats_sync'
    Traceback (most recent call last):
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/scheduler_cli.py", line 710, in cylc_play
        asyncio.get_running_loop()
    RuntimeError: no running event loop
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 703, in run_scheduler
        await self._main_loop()
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/scheduler.py", line 1636, in _main_loop
        self.xtrigger_mgr.call_xtriggers_async(itask)
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/xtrigger_mgr.py", line 722, in call_xtriggers_async
        self.broadcast_mgr.put_broadcast(
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/broadcast_mgr.py", line 381, in put_broadcast
        self.data_store_mgr.delta_broadcast()
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/data_store_mgr.py", line 2311, in delta_broadcast
        self._generate_broadcast_node_deltas(
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/data_store_mgr.py", line 2339, in _generate_broadcast_node_deltas
        cfg['runtime'][node.name]
      File "/opt/niwa/share/cylc/cylc-8.5.0/lib/python3.9/site-packages/cylc/flow/parsec/OrderedDict.py", line 38, in __getitem__
        return OrderedDict.__getitem__(self, key)
    KeyError: 'cvt_nzlam_ps38_stats_sync'

This could happen with an orphaned running task, to mitigate I've handled this and similar calls in the data-store to carry on with an empty runtime item.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@dwsutherland dwsutherland added this to the 8.5.2 milestone Aug 25, 2025
@dwsutherland dwsutherland self-assigned this Aug 25, 2025
@dwsutherland dwsutherland changed the title fix no xtrig cfg broadcast delta shutdown fix no cfg xtrig broadcast delta shutdown Aug 25, 2025
@hjoliver
Copy link
Copy Markdown
Member

@dwsutherland -

  • Can you bang some comments in the code to explain why we're not just using the dict key directly? (To avoid future regression)
  • Are there any consequences to "carrying on with an empty runtime item"?

@dwsutherland
Copy link
Copy Markdown
Member Author

@dwsutherland -

  • Can you bang some comments in the code to explain why we're not just using the dict key directly? (To avoid future regression)
  • Are there any consequences to "carrying on with an empty runtime item"?

Actually yes, my solution was a bit coarse .. I'm pushing another

@dwsutherland dwsutherland force-pushed the no-xtrig-cfg-bc-error-fix branch from c43bedf to d98efe8 Compare August 26, 2025 07:38
@MetRonnie MetRonnie added the bug Something is wrong :( label Aug 26, 2025
j_cfg[key] = val
j_buf.runtime.CopyFrom(runtime_from_config(j_cfg))
else:
j_buf.runtime.CopyFrom(tproxy.runtime)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toyed about with creating a function that is less rigid and can use existing store objects as a base..

def runtime_from_partial(rtconfig, runtimeold=None):
    """Populate runtime object from partial/full config.

    Potentially slower with all the setattr calls, but less argvar assignments
    than `runtime_from_config`.
    """
    runtime = PbRuntime()
    if runtimeold is not None:
        runtime.CopyFrom(runtimeold)
    for key, val in rtconfig.items():
        if key not in RUNTIME_MAPPING:
            continue
        elif key in RUNTIME_LIST_JOINS:
            setattr(runtime, RUNTIME_MAPPING[key], listjoin(val))
        elif key in RUNTIME_JSON_DUMPS:
            setattr(
                runtime,
                RUNTIME_MAPPING[key],
                json.dumps(
                    {'key': k, 'value': v}
                    for k, v in val.items()
                )
            )
        elif key in RUNTIME_TRY_ITEMS:
            try:
                setattr(
                    runtime,
                    RUNTIME_MAPPING[key],
                    val[RUNTIME_TRY_ITEMS[key]]
                )
            except (KeyError, TypeError):
                setattr(runtime, RUNTIME_MAPPING[key], val)
        elif key in RUNTIME_STRINGIFYS:
            setattr(runtime, RUNTIME_MAPPING[key], str(val or ''))
        else:
            setattr(runtime, RUNTIME_MAPPING[key], val)
    return runtime

but setattr doesn't seem to be playing nice with these objects (will find out why), and maybe it's slower anyway..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test giving:

            elif key in RUNTIME_STRINGIFYS:
                setattr(runtime, RUNTIME_MAPPING[key], str(val or ''))
            else:
>               setattr(runtime, RUNTIME_MAPPING[key], val)
E               TypeError: bad argument type for built-in operation

protobuf objects can be a little finnicky about how fields are assigned/activated sometimes

@dwsutherland dwsutherland reopened this Aug 29, 2025
@dwsutherland dwsutherland added the duplicate This is a duplicate of something else label Aug 29, 2025
@dwsutherland
Copy link
Copy Markdown
Member Author

Duplicate of #6817

@dwsutherland dwsutherland marked this as a duplicate of #6817 Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is wrong :( duplicate This is a duplicate of something else

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants