added metrics for vulnerabilities on a workload level #24

hebestreit · 2024-12-11T23:33:25Z

Overview

Currently the Prometheus Exporter only provides metrics on a cluster and namespace level. We find it useful to also have an overview on a workload level which gives the possibility to know which exact Deployment has the most vulnerabilities or define custom alerts.

According to the existing metrics name pattern a new suffix is introduced for Vulnerabilities and ConfigurationScans like:

kubescape_controls_total_workload_<severity>
kubescape_vulnerabilities_total_workload_<severity>
kubescape_vulnerabilities_relevant_workload_<severity>

Additional Information

Initial discussion started here:
https://cloud-native.slack.com/archives/C04GY6H082K/p1733500846063089

How to Test

run tests
start Prometheus Exporter and open http://localhost:8080/metrics

Examples/Screenshots

This is how the metrics are exported via /metrics endpoint. Note the value is a dummy.

kubescape_controls_total_workload_medium{namespace="monitoring",workload="promtail",workload_kind="serviceaccount"} 1
kubescape_vulnerabilities_total_workload_critical{namespace="monitoring",workload="promtail",workload_kind="daemonset"} 2
kubescape_vulnerabilities_relevant_workload_medium{namespace="monitoring",workload="promtail",workload_kind="daemonset"} 3

api/api.go

main.go

Signed-off-by: hebestreit <[email protected]>

matthyx

it probably works for a small cluster, for bigger ones I would use watch and pagers to avoid overloading the storage component

api/api.go

main.go

matthyx · 2024-12-12T06:29:25Z

api/api.go

@@ -43,6 +62,26 @@ func (sc *StorageClientImpl) GetVulnerabilitySummaries() (*v1beta1.Vulnerability

 }

+func (sc *StorageClientImpl) GetWorkloadConfigurationScanSummaries() (*v1beta1.WorkloadConfigurationScanSummaryList, error) {
+	workloadConfigurationScanSummaries, err := sc.clientset.SpdxV1beta1().WorkloadConfigurationScanSummaries("").List(context.TODO(), metav1.ListOptions{})


I think rather than getting the full list all the time, it would make sense to use a Watch() and update counters as you get added/removed events for the objects in a go routine - and when requested you just provide the counters values

Makes sense.

Do I understand correctly that the Prometheus Exporter gets the full list on startup using a pager, populates all counters and starts a go routine with Watch(). This guarantees that the Prometheus Exporter first synchronizes with the cluster state and doesn't overload the storage component for updates.

Inside this go routine it gets notified about Kubescape resources (WorkloadConfigurationScanSummary and VulnerabilityManifestSummary) being added/removed to increase/decrease the counter value.

Watch gives you the existing CRDs by default... hmm let me check

you can check how we do it https://github.com/kubescape/synchronizer/blob/241129d72bfdb1c9a2f692e7ae2599c023d9c094/adapters/incluster/v1/client.go#L198

Thanks for the link. I implemented the logic and did some quick tests which seems to work. When an object is received it simply calls the same function as for the full list to update a single item.

Initial exposed metric during startup:

kubescape_vulnerabilities_total_workload_low{namespace="external-dns",workload="external-dns",workload_kind="deployment"} 1234

After deleting the Vulnerabilitymanifestsummaries resource for external-dns:

kubescape_vulnerabilities_total_workload_low{namespace="external-dns",workload="external-dns",workload_kind="deployment"} 0

Let me know what you think.

I think it's good... does it perform the way you want? Can we merge it?

I just recognized that workloads with multiple containers overwrite the metric value because they share the same workload name and kind but are separate VulnerabilityManifestSummary resources.

My idea is to add the kubescape.io/workload-container-name label as metric label workload_container_name which provides the benefit to filter also on a container level but definitely increases the number of exported metrics.

Later today or tomorrow I'll let you know how it impacts the performance.

The total number of exported metrics has been significantly increased and is heavily dependent on the size of the cluster. The option of deactivating this function via an environment variable is a good compromise.

I have also added logic to delete the exported metric from the output when the resource has been deleted in the cluster. So it corresponds to the same state when the Prometheus Exporter is restarted and the deleted resource is no longer available.

I think we're good to merge now 👍

Signed-off-by: hebestreit <[email protected]>

…f items added environment variable to enable metrics on workload level Signed-off-by: hebestreit <[email protected]>

added logic to delete exported metric when resource has been deleted in cluster Signed-off-by: hebestreit <[email protected]>

matthyx · 2024-12-18T06:59:40Z

thanks @hebestreit !

hebestreit commented Dec 11, 2024

View reviewed changes

api/api.go Outdated Show resolved Hide resolved

hebestreit commented Dec 11, 2024

View reviewed changes

main.go Outdated Show resolved Hide resolved

hebestreit force-pushed the feature/workload branch from c33d76f to 2c1dbf4 Compare December 11, 2024 23:43

added metrics for vulnerabilities on a workload level

9c469ca

Signed-off-by: hebestreit <[email protected]>

hebestreit force-pushed the feature/workload branch from 2c1dbf4 to 9c469ca Compare December 11, 2024 23:55

matthyx reviewed Dec 12, 2024

View reviewed changes

hebestreit force-pushed the feature/workload branch from 9f9cd14 to 733db8f Compare December 12, 2024 22:34

hebestreit added 2 commits December 12, 2024 23:35

implemented pager to retrieve items in chunks when enriching them

f1af462

Signed-off-by: hebestreit <[email protected]>

added watcher go routines instead of periodically fetch a full list o…

5f6a738

…f items added environment variable to enable metrics on workload level Signed-off-by: hebestreit <[email protected]>

hebestreit force-pushed the feature/workload branch from 733db8f to 5f6a738 Compare December 12, 2024 22:35

matthyx approved these changes Dec 16, 2024

View reviewed changes

added "workload_container_name" label to workload metrics

657e3a7

added logic to delete exported metric when resource has been deleted in cluster Signed-off-by: hebestreit <[email protected]>

matthyx approved these changes Dec 18, 2024

View reviewed changes

matthyx added the release label Dec 18, 2024

matthyx merged commit a63a525 into kubescape:main Dec 18, 2024
2 checks passed

hebestreit mentioned this pull request Dec 18, 2024

allow prometheus-exporter to export metrics on workload level kubescape/helm-charts#573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added metrics for vulnerabilities on a workload level #24

added metrics for vulnerabilities on a workload level #24

Uh oh!

hebestreit commented Dec 11, 2024

Uh oh!

Uh oh!

Uh oh!

matthyx left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matthyx Dec 12, 2024

Uh oh!

hebestreit Dec 12, 2024 •

edited

Loading

Uh oh!

matthyx Dec 12, 2024 •

edited

Loading

Uh oh!

matthyx Dec 12, 2024

Uh oh!

hebestreit Dec 12, 2024

Uh oh!

matthyx Dec 16, 2024

Uh oh!

hebestreit Dec 16, 2024

Uh oh!

hebestreit Dec 17, 2024

Uh oh!

Uh oh!

matthyx commented Dec 18, 2024

Uh oh!

Uh oh!

added metrics for vulnerabilities on a workload level #24

added metrics for vulnerabilities on a workload level #24

Uh oh!

Conversation

hebestreit commented Dec 11, 2024

Overview

Additional Information

How to Test

Examples/Screenshots

Uh oh!

Uh oh!

Uh oh!

matthyx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matthyx Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

hebestreit Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthyx Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthyx Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

hebestreit Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

matthyx Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

hebestreit Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

hebestreit Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matthyx commented Dec 18, 2024

Uh oh!

Uh oh!

hebestreit Dec 12, 2024 •

edited

Loading

matthyx Dec 12, 2024 •

edited

Loading