-
Notifications
You must be signed in to change notification settings - Fork 18
Feature/multi loader logs collection #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature/multi loader logs collection #598
Conversation
3dfaaaf
to
d5d74ac
Compare
44717ff
to
c4798ba
Compare
535398d
to
0e11bd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice. I left some comments.
ba47317
to
bfb51ab
Compare
@leokondrashov as discussed, added the log consolidation logic in 0dc0950 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks. Please fix couple minor comments
Signed-off-by: Lenson <[email protected]> add node discovery validators Signed-off-by: Lenson <[email protected]> add collect TOP metric functions Signed-off-by: Lenson <[email protected]> add multi-loader metric_manager Signed-off-by: Lenson <[email protected]> add autoscaler log collection Signed-off-by: Lenson <[email protected]> add activator log collection Signed-off-by: Lenson <[email protected]> add prometh log collection Signed-off-by: Lenson <[email protected]> refactor metric manager contants Signed-off-by: Lenson <[email protected]> minor fix for node discovery Signed-off-by: Lenson <[email protected]> fix node discovery Signed-off-by: Lenson <[email protected]> minor fix Signed-off-by: Lenson <[email protected]> minor fix Signed-off-by: Lenson <[email protected]> add logs for prometh Signed-off-by: Lenson <[email protected]> add pause between prometh collection Signed-off-by: Lenson <[email protected]> update wait time Signed-off-by: Lenson <[email protected]> update condition for node discovery Signed-off-by: Lenson <[email protected]> update logging Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]> update kind ssh update script Signed-off-by: Lenson <[email protected]> fix setup kind ssh Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]> update setup metrics script Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]> fix log collection test commit a05990d Author: Lenson <[email protected]> Date: Mon Feb 3 15:39:39 2025 +0800 update test trigger Signed-off-by: Lenson <[email protected]> commit 3edb3b4 Author: Lenson <[email protected]> Date: Mon Feb 3 15:33:06 2025 +0800 update test Signed-off-by: Lenson <[email protected]> commit 56a0f7d Author: Lenson <[email protected]> Date: Mon Feb 3 15:18:40 2025 +0800 fix Signed-off-by: Lenson <[email protected]> commit 67c520d Author: Lenson <[email protected]> Date: Mon Feb 3 15:06:20 2025 +0800 fix Signed-off-by: Lenson <[email protected]> commit 48ff845 Author: Lenson <[email protected]> Date: Mon Feb 3 14:46:29 2025 +0800 test' Signed-off-by: Lenson <[email protected]> commit 295c761 Author: Lenson <[email protected]> Date: Mon Feb 3 14:45:35 2025 +0800 add adv log collection tests Signed-off-by: Lenson <[email protected]> commit 8469bdb Author: Lenson <[email protected]> Date: Mon Feb 3 14:45:05 2025 +0800 update logging Signed-off-by: Lenson <[email protected]> commit 10e295a Author: Lenson <[email protected]> Date: Mon Feb 3 14:44:42 2025 +0800 update kind ssh update script Signed-off-by: Lenson <[email protected]> commit c56a9d8 Author: Lenson <[email protected]> Date: Mon Feb 3 13:19:27 2025 +0800 add KinD ssh setup script Signed-off-by: Lenson <[email protected]> commit bf9a804 Author: Lenson <[email protected]> Date: Mon Feb 3 10:31:55 2025 +0800 update condition for node discovery Signed-off-by: Lenson <[email protected]> commit b3f078b Author: Lenson <[email protected]> Date: Fri Jan 31 18:35:03 2025 +0800 add multi loader log collection Signed-off-by: Lenson <[email protected]> add node discovery validators Signed-off-by: Lenson <[email protected]> add collect TOP metric functions Signed-off-by: Lenson <[email protected]> add multi-loader metric_manager Signed-off-by: Lenson <[email protected]> add autoscaler log collection Signed-off-by: Lenson <[email protected]> add activator log collection Signed-off-by: Lenson <[email protected]> add prometh log collection Signed-off-by: Lenson <[email protected]> refactor metric manager contants Signed-off-by: Lenson <[email protected]> minor fix for node discovery Signed-off-by: Lenson <[email protected]> fix node discovery Signed-off-by: Lenson <[email protected]> minor fix Signed-off-by: Lenson <[email protected]> minor fix Signed-off-by: Lenson <[email protected]> add logs for prometh Signed-off-by: Lenson <[email protected]> add pause between prometh collection Signed-off-by: Lenson <[email protected]> update wait time Signed-off-by: Lenson <[email protected]> commit 9bac3c4 Author: Lenson <[email protected]> Date: Tue Jan 21 13:00:50 2025 +0800 update multi loader docs Signed-off-by: Lenson <[email protected]> update multi-loader docs Signed-off-by: Lenson <[email protected]> commit bfd17be Author: Lenson <[email protected]> Date: Mon Jan 20 16:30:13 2025 +0800 minor multi loader fix Signed-off-by: Lenson <[email protected]> fix incorrect retry logging Signed-off-by: Lenson <[email protected]> remove iat and generated cli args Signed-off-by: Lenson <[email protected]> remove make clean from clean up Signed-off-by: Lenson <[email protected]> commit 91042aa Author: Lenson <[email protected]> Date: Thu Jan 16 15:53:19 2025 +0800 update tests Signed-off-by: Lenson <[email protected]> update multi loader e2e tests Signed-off-by: Lenson <[email protected]> revert setup.cfg Signed-off-by: Lenson <[email protected]> chmod script Signed-off-by: Lenson <[email protected]> update unit tests Signed-off-by: Lenson <[email protected]> fix e2e test Signed-off-by: Lenson <[email protected]> update tests Signed-off-by: Lenson <[email protected]> commit 69c3c3a Author: Lenson <[email protected]> Date: Tue Dec 31 11:49:55 2024 +0800 add failfast flag Signed-off-by: Lenson <[email protected]> update failfast flag description Signed-off-by: Lenson <[email protected]> update comments Signed-off-by: Lenson <[email protected]> update wordlist with multiloader specific words Signed-off-by: Lenson <[email protected]> simplify run experiment logic Signed-off-by: Lenson <[email protected]> refactor partial experiment naming Signed-off-by: Lenson <[email protected]> fix wrong indexing Signed-off-by: Lenson <[email protected]> add progress in logging Signed-off-by: Lenson <[email protected]> commit fc3ad98 Author: Lenson <[email protected]> Date: Sun Nov 17 14:07:35 2024 +0800 refactor multi loader Signed-off-by: Lenson <[email protected]> add multi-loader tests Signed-off-by: Lenson <[email protected]> update test Signed-off-by: Lenson <[email protected]> refactor multi-loader tests Signed-off-by: Lenson <[email protected]> add loader experiment Signed-off-by: Lenson <[email protected]> update logs Signed-off-by: Lenson <[email protected]> update log verbosity Signed-off-by: Lenson <[email protected]> update logs Signed-off-by: Lenson <[email protected]> update logs Signed-off-by: Lenson <[email protected]> rename multiloader driver to runner Signed-off-by: Lenson <[email protected]> refactor common files to multiloader folder Signed-off-by: Lenson <[email protected]> refactor multiloader functions Signed-off-by: Lenson <[email protected]> rename createNewStudy function name Signed-off-by: Lenson <[email protected]> fix formatting Signed-off-by: Lenson <[email protected]> remove extra features Signed-off-by: Lenson <[email protected]> remove extra features Signed-off-by: Lenson <[email protected]> add validation for platform Signed-off-by: Lenson <[email protected]> commit ca5e2ad Author: Lenson <[email protected]> Date: Sat Nov 16 18:49:35 2024 +0800 add multi loader documentation Signed-off-by: Lenson <[email protected]> update docs Signed-off-by: Lenson <[email protected]> fix docs Signed-off-by: Lenson <[email protected]> update documentation Signed-off-by: Lenson <[email protected]> commit 3c7e6b5 Author: Lenson <[email protected]> Date: Sat Nov 16 12:36:43 2024 +0800 add multi-loader Signed-off-by: Lenson <[email protected]> add multi-loader config reader Signed-off-by: Lenson <[email protected]> add multi loader base Signed-off-by: Lenson <[email protected]> add multi loader base Signed-off-by: Lenson <[email protected]> add node group struct Signed-off-by: Lenson <[email protected]> add multi loader runner Signed-off-by: Lenson <[email protected]> refactor multi loader config Signed-off-by: Lenson <[email protected]> add multi loader config validators Signed-off-by: Lenson <[email protected]> add knative specific config enricher Signed-off-by: Lenson <[email protected]> add additional knative platform type Signed-off-by: Lenson <[email protected]> add base runner entry point Signed-off-by: Lenson <[email protected]> refactor multi loader config Signed-off-by: Lenson <[email protected]> update multi loader config struct Signed-off-by: Lenson <[email protected]> update unpack study doc Signed-off-by: Lenson <[email protected]> add unpack study Signed-off-by: Lenson <[email protected]> add prepare experiment Signed-off-by: Lenson <[email protected]> update experiment config temp path Signed-off-by: Lenson <[email protected]> add run loader function Signed-off-by: Lenson <[email protected]> update log parser Signed-off-by: Lenson <[email protected]> update log parser Signed-off-by: Lenson <[email protected]> update log parser Signed-off-by: Lenson <[email protected]> add clean up function Signed-off-by: Lenson <[email protected]> add logs to indicate run status Signed-off-by: Lenson <[email protected]> expose entry points for multi loader runner Signed-off-by: Lenson <[email protected]> add multi loader runner execution Signed-off-by: Lenson <[email protected]> update default multi loader config path Signed-off-by: Lenson <[email protected]> add cpu limit validator Signed-off-by: Lenson <[email protected]> remove extra knative feature Signed-off-by: Lenson <[email protected]> remove knative extra features Signed-off-by: Lenson <[email protected]> add multi loader tests Signed-off-by: Lenson <[email protected]> add basic config Signed-off-by: Lenson <[email protected]> update basic config Signed-off-by: Lenson <[email protected]> update basic config Signed-off-by: Lenson <[email protected]> add basic configs Signed-off-by: Lenson <[email protected]> update base config Signed-off-by: Lenson <[email protected]> Signed-off-by: Lenson <[email protected]> update e2e test Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
67ea824
to
02c8260
Compare
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]> update metrics description in docs Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]> add interval for prometh snapshot collection Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
02c8260
to
8580b19
Compare
Hi @cvetkovic, this PR extends the previously added multi-loader tool by introducing enhanced log collection capabilities. The new feature allows users to gather logs from the Activator and Autoscaler nodes, retrieve TOP metrics from all cluster nodes, and capture Prometheus snapshots. Users can also specify the exact metrics they want to collect using the newly introduced Metric field in the multi-loader configuration. I would appreciate your review and if everything looks good, I will tidy up the commits and prepare for merging into main. Thank you! |
KIND_VERSION: v0.22.0 | ||
K8S_VERSION: v1.29 | ||
YAML_DIR: workloads/container | ||
runs-on: ubuntu-20.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to 24.04. The current one will be deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 44e22ad
@@ -48,13 +50,23 @@ func NewMultiLoaderRunner(configPath string, verbosity string, failFast bool) (* | |||
// Determine platform | |||
platform := ml_common.DeterminePlatformFromConfig(multiLoaderConfig) | |||
|
|||
// Determine Node Group for Knative platform | |||
if strings.HasPrefix(platform, "Knative") && len(multiLoaderConfig.Metrics) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Knative" has been replaced by a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in 149b908
Signed-off-by: Lenson <[email protected]>
Signed-off-by: Lenson <[email protected]>
9d2f932
to
149b908
Compare
Summary
Extends multi-loader by collecting key logs from nodes in the cluster for the Knative platform. Users can optionally collect the following logs:
Implementation Notes ⚒️
Metrics
field in the multi-loader config, accepting an array with any of the following values:top
,prometheus
,activator
,autoscaler
.MasterNode
,ActivatorNode
,AutoscalerNode
, andWorkerNodes
to allow users to manually specify IPs instead of relying on multi-loader to determine them (mostly unnecessary in typical scenarios).kubectl
to automatically determine node IPs and classify them based on their roles./var/log/pods/knative-serving_activator-*/activator/*
/var/log/pods/knative-serving_autoscaler-*/autoscaler/*
External Dependencies 🍀
Breaking API Changes⚠️