Skip to content

Fix data store missing info for submit-failed jobs#6926

Merged
wxtim merged 4 commits intocylc:8.5.xfrom
MetRonnie:submit-time
Aug 26, 2025
Merged

Fix data store missing info for submit-failed jobs#6926
wxtim merged 4 commits intocylc:8.5.xfrom
MetRonnie:submit-time

Conversation

@MetRonnie
Copy link
Copy Markdown
Member

@MetRonnie MetRonnie commented Aug 15, 2025

  • Submitted time1
  • Job runner
  • Job ID in the job runner (on restart, as it was wiped from DB on submit-failure)

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • No dependency changes
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Docs PR not needed
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

Footnotes

  1. Submitted time in this case is really the time of failed submission, but useful to know. It is recorded in the DB but wasn't showing in the UI.

@MetRonnie MetRonnie added this to the 8.5.2 milestone Aug 15, 2025
@MetRonnie MetRonnie self-assigned this Aug 15, 2025
@MetRonnie MetRonnie added bug Something is wrong :( small labels Aug 15, 2025
@MetRonnie MetRonnie marked this pull request as draft August 18, 2025 11:25
@MetRonnie MetRonnie marked this pull request as ready for review August 18, 2025 13:57
self.bad_hosts -= exc.hosts_consumed
self._set_retry_timers(itask, rtconfig)
# Provide dummy platform otherwise it will show as localhost
# in the data store:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context easily lost, worth linking this in:

Suggested change
# in the data store:
# in the data store:
# see https://github.com/cylc/cylc-flow/pull/6926

Copy link
Copy Markdown
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MetRonnie MetRonnie changed the title Fix submitted time & job runner missing from data store for submit-failed jobs Fix data store missing info for submit-failed jobs Aug 20, 2025
Remove spurious submit-failed dummy job in task proxy job list
@MetRonnie MetRonnie force-pushed the submit-time branch 2 times, most recently from 991d7ee to d1a56d4 Compare August 21, 2025 12:30
Comment on lines -797 to -798
# ... but either way update the job ID in the job proxy (it only
# comes in via the submission message).
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is done by data_store_mgr.insert_job()

"time_submit_exit": event_time,
"submit_status": 1,
})
itask.summary['submit_method_id'] = None
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why the job ID was being wiped on submit-failure

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know either, but there may be a reason, if not part of the bugfix, plz bump.

Copy link
Copy Markdown
Member

@oliver-sanders oliver-sanders Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this the job ID or the job submit method?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Job ID in the job runner, it's part 3 of the bugfix

self,
name: str,
cycle_point: Union['PointBase', str],
itask: 'TaskProxy',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to avoid passing itask objects to the data store where possible.

We have had to do this in a couple of places, but we don't need to update the remaining interfaces to match.

In theory, we are supposed to be able to populate the data store out of the data base (without the Scheduler or its runtime objects, e.g. TaskProxy) so we can provide offline data.

In truth that isn't possible right now, but we should try to reduce the pain of refactor when the time comes.

execution_time_limit=job_conf.get('execution_time_limit'),
platform=job_conf['platform']['name'],
job_runner_name=job_conf.get('job_runner_name'),
job_id=itask.summary.get('submit_method_id'),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there is no Job ID for a submission failure caused by a platform lookup error because no job submission was made.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get() will return None in this case, which is fine

@wxtim wxtim merged commit e770f48 into cylc:8.5.x Aug 26, 2025
28 checks passed
@MetRonnie MetRonnie deleted the submit-time branch August 26, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is wrong :( small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants