Skip to content

Conversation

@sunset666
Copy link
Collaborator

@sunset666 sunset666 commented Nov 24, 2025

Starting approach to gather statistics.

It queries search API to get all Published and QA derived datasets, then ask for its full path to access the session.log.

Starts parsing the log and sets starting and ending points based on the jobs (completed jobs). If the docker image requires GPU is counts this as a GPU job.

Caveats:

  1. Some sub-tasks of a step in a pipeline happening on GPU nodes will be counted as CPU time, because it actually uses CPU to perform the sub-task.
  2. It will count duplicated CPU/GPU jobs for failed attempts. If a step, or sub-task was performed successfully on a failed run and is restarted, the parser has no knowledge of it and will treat it as an independent job.

Debugged functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants