-
Notifications
You must be signed in to change notification settings - Fork 4
[SARC-395] Ajuster la fonction de conversion gpu->rgu pour supporter différentes versions à travers le temps. #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@bouthilx Voici une PR pour finir la gestion des RGUs ! Par rapport au document de référence, j'ai toutefois apporté une petite modification. Dans le document de référence ("GPU vs RGU"), pour calculer les RGUs sur DRAC, on avait prévu le calcul suivant:
Cependant, en observant des jobs réels, il me semble que Comme exemple, j'ai ce genre de jobs: {
"cluster_name": "beluga",
"job_id": 47622739,
"job_state": "CANCELLED",
"exit_code": 0,
"partition": "gpubase_bynode_b1",
"nodes": [
"bg12106",
"bg12107",
"bg12108",
"bg12113"
],
"submit_time": "2024-05-23 19:15:55-04:00",
"start_time": "2024-05-23 19:15:57-04:00",
"end_time": "2024-05-23 19:55:28-04:00",
"elapsed_time": 2371,
"requested": {
"cpu": 160,
"mem": 737280,
"node": 4,
"billing": 35555,
"gres_gpu": 16,
"gpu_type": null
},
"allocated": {
"cpu": 160,
"mem": 737280,
"node": 4,
"billing": 35555,
"gres_gpu": 16,
"gpu_type": "Tesla V100-SXM2-16GB"
}
}, Le billing ici est
Qui m'indique donc PS: Je rappelle que J'ai donc remplacé:
Par:
|
23a7777
to
ab9ad53
Compare
PS: Cette PR modifie |
ab9ad53
to
1c387f9
Compare
I've rebased this onto master to make the change from poetry to uv. |
…différentes versions à travers le temps. - Read GPU billing from database, not from config anymore - Add dependency `iguane` to get GPU->RGU values - Add new client function get_rgus() - Move series function into client: update_job_series_rgu() - update_job_series_rgu(): take into account evolution of GPU billing acrosse time and type of GPU billing (billing_is_gpu) on each cluster - load_job_series(): make sure users columns are included only if job `user` column is included in data frame. - tests: allow to create entries for all testing clusters: read cluster names from sarc-test.json
- harmonize names of billed GPUs - get GPU nodes as a list instead of a string, as some nodes may have many GPUs (e.g. MIG GPUs) Improve RGU function to handle harmonized names of MIG GPUs. Improve update_allocated_gpu_type(): - check default allocated.gpu_type if a single gpu_type cannot be inferred from nodes - harmonize GPU name using __DEFAULTS__ if available even if job does not have nodes
fdf4659
to
fa2a17a
Compare
iguane
to get GPU->RGU valuesuser
column is included in data frame.