Description
What happened?
Data source: ec2 spot instance 5c.metal
This issue describes the significant different between power metrics collected on Feb and the power metrics collected on July. While CPU time from both are fair, the power consumption on July seems to much more increases from beginning even with a small load. The power of this machine seems to saturate around 450. These power number are from intel rapl directly.
Further investigation found that in July, the CPU instruction counter is highly increased compare to those in Feb.
previously (around Feb 2024)
current (July 2024)
What did you expect to happen?
Increment of CPU instruction used by Kepler should be explainable.
We should further investigate more metrics since CPU time is not enough for modeling.
How can we reproduce it (as minimally and precisely as possible)?
Run Kepler release in Feb separately from Kepler release in July.
Anything else we need to know?
No response
Kepler image tag
Deployment
- Model server
- Estimator
- Online trainer
- Offline trainer
- Profiler
Kepler model server image tag if deployed
Kepler estimator image tag if deployed
Kepler online trainer image tag if deployed
Kepler offline trainer image tag if deployed
Kepler profiler image tag if deployed
Kubernetes version
$ kubectl version
# paste output here
Install tools
Kepler deployment config
For on kubernetes:
$ KEPLER_NAMESPACE=kepler
# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE}
# paste output here
# provide kepler model server configmap if Kepler Model Server is deployed
$ kubectl get configmap kepler-model-server-cfm -n ${KEPLER_NAMESPACE}
# paste output here
# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE}
For standalone: