-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added metrics for vulnerabilities on a workload level #24
Conversation
c33d76f
to
2c1dbf4
Compare
Signed-off-by: hebestreit <[email protected]>
2c1dbf4
to
9c469ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it probably works for a small cluster, for bigger ones I would use watch and pagers to avoid overloading the storage component
api/api.go
Outdated
@@ -43,6 +62,26 @@ func (sc *StorageClientImpl) GetVulnerabilitySummaries() (*v1beta1.Vulnerability | |||
|
|||
} | |||
|
|||
func (sc *StorageClientImpl) GetWorkloadConfigurationScanSummaries() (*v1beta1.WorkloadConfigurationScanSummaryList, error) { | |||
workloadConfigurationScanSummaries, err := sc.clientset.SpdxV1beta1().WorkloadConfigurationScanSummaries("").List(context.TODO(), metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than getting the full list all the time, it would make sense to use a Watch() and update counters as you get added/removed events for the objects in a go routine - and when requested you just provide the counters values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
Do I understand correctly that the Prometheus Exporter gets the full list on startup using a pager, populates all counters and starts a go routine with Watch(). This guarantees that the Prometheus Exporter first synchronizes with the cluster state and doesn't overload the storage component for updates.
Inside this go routine it gets notified about Kubescape resources (WorkloadConfigurationScanSummary
and VulnerabilityManifestSummary
) being added/removed to increase/decrease the counter value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch gives you the existing CRDs by default... hmm let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link. I implemented the logic and did some quick tests which seems to work. When an object is received it simply calls the same function as for the full list to update a single item.
Initial exposed metric during startup:
kubescape_vulnerabilities_total_workload_low{namespace="external-dns",workload="external-dns",workload_kind="deployment"} 1234
After deleting the Vulnerabilitymanifestsummaries
resource for external-dns:
kubescape_vulnerabilities_total_workload_low{namespace="external-dns",workload="external-dns",workload_kind="deployment"} 0
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good... does it perform the way you want? Can we merge it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just recognized that workloads with multiple containers overwrite the metric value because they share the same workload name and kind but are separate VulnerabilityManifestSummary
resources.
My idea is to add the kubescape.io/workload-container-name
label as metric label workload_container_name
which provides the benefit to filter also on a container level but definitely increases the number of exported metrics.
Later today or tomorrow I'll let you know how it impacts the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total number of exported metrics has been significantly increased and is heavily dependent on the size of the cluster. The option of deactivating this function via an environment variable is a good compromise.
I have also added logic to delete the exported metric from the output when the resource has been deleted in the cluster. So it corresponds to the same state when the Prometheus Exporter is restarted and the deleted resource is no longer available.
I think we're good to merge now 👍
9f9cd14
to
733db8f
Compare
Signed-off-by: hebestreit <[email protected]>
…f items added environment variable to enable metrics on workload level Signed-off-by: hebestreit <[email protected]>
733db8f
to
5f6a738
Compare
added logic to delete exported metric when resource has been deleted in cluster Signed-off-by: hebestreit <[email protected]>
thanks @hebestreit ! |
Overview
Currently the Prometheus Exporter only provides metrics on a cluster and namespace level. We find it useful to also have an overview on a workload level which gives the possibility to know which exact Deployment has the most vulnerabilities or define custom alerts.
According to the existing metrics name pattern a new suffix is introduced for Vulnerabilities and ConfigurationScans like:
kubescape_controls_total_workload_<severity>
kubescape_vulnerabilities_total_workload_<severity>
kubescape_vulnerabilities_relevant_workload_<severity>
Additional Information
Initial discussion started here:
https://cloud-native.slack.com/archives/C04GY6H082K/p1733500846063089
How to Test
Examples/Screenshots
This is how the metrics are exported via
/metrics
endpoint. Note the value is a dummy.