GDC Connected Servers Enterprise Observability

Overview

This project contains predefined dashboards and alerts for enterprise workloads running on GDC Connected Servers.

Deployment Quickstart

Dashboards and alerts can be deployed through the following methods:

Option 1 - Scripted deployment

cd alerts
Run ./create-alerts.sh. This will deploy scripts into your current context's project. Modify script if notification channels are needed.

Dashboards are stored in the dashboards folder and can be manually deployed.

Option 2 - Terraform deployment

cd terraform
cp backend.tf.sample to backend.tf and modify to store tfstate in target cloud storage bucket.
terraform plan/teraform apply

Dashboards

Dashboard Name	Description	json
GDC Daily Report	Dashboard showing node/VM availability and utilization based metrics	json
GDC Node View	Dashboard showing GDC node information	json
GDC VM Status	Dashboard showing GDC VM information	json
GDC Robin Status	Dashboard to deep-dive into robin metrics. Note: this dashboard requires the use of the robin-health application	json
GDC External Secrets	Dashboard showing External Secrets operational information	json
GDC VM Distribution	Dashboard showing VM distribution by node	json

Alerts

Alert	Category	Description	link
node-cpu-usage-high	Node	Alert when CPU usage of any node exceeds 80%	config
node-memory-usage-high	Node	Alert when memory usage of any node exceeds 80%	config
node-not-ready-30m	Node	Alert if any node is not ready for more than 30 minutes	config
multiple-nodes-not-ready-realtime	Node	Alert if multiple nodes are not ready at any time	config
api-server-error-ratio-5-percent	Control-plane	Alert if the API server has an error ratio exceeding 5%	config
apiserver-down	Control-plane	Alert if api server is down	config
controller-manager-down	Control-plane	Alert if controller manager is down	config
scheduler-down	Control-plane	Alert if scheduler is down	config
pod-crash-looping	Pods	Alert if a pod is crashlooping	config
pod-not-ready-1h	Pods	Alert if a pod is not ready for more than an hour	config
coredns-down	System	Alert if CoreDNS is down	config
coredns-servfail-ratio-1-percent	System	Alert if greater than 1 percent of DNS requests are SERVFAILs	config
robin-master-down-10m	Storage	Alert if robin master is down for more than 10 minutes	config
robin-node-offline-30m	Storage	Alert if a robin node is offline for more than 30 minutes	config
robin-disk-inactive-10m	Storage	Alert if robin disk is inactive for more than 10 minutes	config
vmruntime-heartbeats-active-realtime	VMRuntime	Alert if VMRuntime heartbeats are missing	config
vmruntime-heartbeats-realtime	VMRuntime	Alert if VMRuntime heartbeats are 0	config
vmruntime-vm-down-5m	VMRuntime	Alert if any VM is not active for more than 5 minutes	config
vmruntime-vm-missing-5m	VMRuntime	Alert if CPU activity for a VM are absent for more than 5 minutes	config
vmruntime-vm-no-network-traffic-5m	VMRuntime	Alert if there is no network activity from a VM	config
externalsecrets-down-30m	ExternalSecrets	Alert if External Secrets is down	config
externalsecrets-sync-error	ExternalSecrets	Alert if any ExternalSecret resources have sync errors	config

Disclaimer

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
alerts		alerts
dashboards		dashboards
docs		docs
terraform		terraform
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

GDC Connected Servers Enterprise Observability

Overview

Deployment Quickstart

Option 1 - Scripted deployment

Option 2 - Terraform deployment

Dashboards

Alerts

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Uh oh!

License

Uh oh!

GDC-ConsumerEdge/gdc-connected-servers-observability

Folders and files

Latest commit

History

Repository files navigation

GDC Connected Servers Enterprise Observability

Overview

Deployment Quickstart

Option 1 - Scripted deployment

Option 2 - Terraform deployment

Dashboards

Alerts

Disclaimer

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages