-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
- Big LLM
- Sync training
- Weight Update & Checkpoint
- Drop electricity usage
- Big Swing in electricity usage
Bench
- GPU Burn (Fake LLM workload)
- Sleep (Checkpoint / Weight Update)
- GPU Burn (Fake LLM workload)
- Sleep (Checkpoint / Weight Update)
We want to measure the performance
- MPF: minimum power floor
- Firefly
- 1 ms
NVML
-
On Ampere (except GA100) or newer GPUs, the API returns power averaged over 1 sec interval.
-
On GA100 and older architectures, instantaneous power is returned.
-
Chakra - log the training du cluster meta
-
Voir the highest polling we can get
Metadata
Metadata
Assignees
Labels
No labels