About:

Cloudberry Monitor

About:

Cloudberry Monitor is a set of scripts & schema to permit observability of one or many Apache Cloudberry (incubating) clusters in the abscence of Greenplum Command Center (GPCC). Very likely can also monitor Greenplum (Broadcom) however equally likely to be compatibility issues as one is now closed source.

Benefits, unlike GPCC, are that this system will catch up even when the database is down. It leverages sar logs specifically those created by sysstat 11.7 found in RHEL 8 and variants. As long as those files are being written to host metrics will eventually appear.

Included is a sample Grafana dashboard. Grafana is not necessary however due to ease of installation, configuration, flexibility and alerting it is recommended.

Why not Prometheus? It's a fine product designed to do many things. Desire was ease of extensibility without having to learn other tech's or languages, maintainable, & live data direct from Cloudberry cluster. Tradeoffs.

Working Configurations:

RHEL 8 hosts
pre-Apache Cloudberry 1.6.0 & Apache Cloudberry (incubating) main branch
RHEL 8 based CBmon PostgreSQL

Untested & needs work:

RHEL 9 hosts
Greenplum 6 & 7

Features:

Provides database on which any dashboard can be built (prefer grafana)
Ability to house data from one or many clusters
Remote access to Cloudberry catalogs
Method to collect host performance metrics stored in sar files
Remote execution of functions in Cloudberry
Easily extensible to pull additional information from remote Cloudberry cluster
Post data load functionality to create summaries or specific data points
Data kept as long as necessary or wanted
Requires only 2 scheduled jobs: data gathering, historical data management

Host Configuration:

More frequent intervals for sysstat is recommended otherwise metrics get watered down as many from the system are averages between collection periods.

Install sysstat everywhere

gpssh -f allhosts sudo dnf -y install sysstat

Modify all cluster hosts to override sysstat-collect.timer to once/minute. This creates /etc/systemd/system/sysstat-collect.timer.d/override.conf.

sudo systemctl edit sysstat-collect.timer

Paste in editor:

[Timer]
OnCalendar=*:00/1

Reload, enable and start

sudo systemctl daemon-reload
sudo systemctl enable --now sysstat

Alternative useful for larger clusters:

cat << EOL > override.conf
[Timer]
OnCalendar=*:00/1
EOL

gpsync -f allhosts override.conf =:/tmp
gpssh -f allhosts sudo mkdir -p /etc/systemd/system/sysstat-collect.timer.d
gpssh -f allhosts sudo install -o root -g root -m 0644 /tmp/override.conf /etc/systemd/system/sysstat-collect.timer.d
gpssh -f allhosts rm -f /tmp/override.conf
gpssh -f allhosts sudo systemctl daemon-reload
gpssh -f allhosts sudo systemctl start sysstat-collect.timer
gpssh -f allhosts sudo systemctl enable sysstat-collect.timer

NOTE for RHEL 9 variants:

gpssh -f allhosts sudo systemctl start sysstat
gpssh -f allhosts sudo systemctl enable sysstat

RHEL 8/9 verify: Files should grow once/minute in /var/log/sa

ls -l /var/log/sa
sar -f /var/log/sa/saDD -b

Create cbmon RPM

cd cloudberry-mon/pkg
./build_rpm -r X -v Y.Z -d el8

Copy RPM to all hosts

gpsync -f allhosts cbmon-Y.Z-X.el8.noarch.rpm =:/tmp

Install on all cluster hosts

gpssh -f allhosts sudo rpm -ivh /tmp/cbmon-Y.Z-X.el8.noarch.rpm

Verify & change ownership

gpssh -f allhosts sudo chown -R gpadmin:gpadmin /usr/local/cbmon

Cloudberry Configuration:

On the mdw host only, perform the following steps:

Set MASTER_DATA_DIRECTORY in /usr/local/cbmon/etc/config
Load each alter in numerical order

/usr/local/cbmon/bin/load_cbalters -d MYDB -p PORT -U gpadmin

Configure pg_hba.conf to allow remote connections from PostgreSQL host & reload

gpstop -u

PostgreSQL Metrics Database Host Only

Install same RPM created above on cbmon host

sudo rpm -ivh cloudberry-mon/pkg/cbmon-Y.Z-X.el8.noarch.rpm

Change ownership to postgres

sudo chown -R postgres:postgres /usr/local/cbmon

Install prerequisites for PostgreSQL, pg_partman and grafana packages. No conceivable reason a version of PostgreSQL >=16 would not work. Prerequsites:

rpm -ivh https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
dnf install perl libxslt libicu-devel clang-devel llvm-devel
dnf install python3-psycopg2
dnf --disablerepo=* --enablerepo=powertools install perl-IPC-Run
dnf -y update

sudo rpm -ivh https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf --nogpgcheck --disablerepo=* --enablerepo=pgdg16 -y install postgresql16 postgresql16-contrib postgresql16-server grafana pg_partman_16

Initialize PostgreSQL
Create cbmon role with SUPERUSER privs
Create cbmon database owned by role cbmon
Load alters in order

/usr/local/cbmon/bin/load_pgalters -d cbmon -p PORT -U cbmon

Configure postgresql.conf per alter output permitting pg_partman usage

shared_preload_libraries = 'pg_partman_bgw'
pg_partman_bgw.interval = '3600'
pg_partman_bgw.role = 'partman'
pg_partman_bgw.dbname = 'cbmon'
pg_partman_bgw.analyze = 'on'

Configure pg_hba.conf to permit grafana & cbmon user access & restart

systemctl restart postgresql-16

Configure grafana for your environment & start

systemctl start grafana-server

Enable PostgreSQL & grafana to start on reboot

systemctl enable postgresql-16
systemctl enable grafana-server

Creating clusters

Tenancy in cbmon PostgreSQL database is done via schemas where a cluster schema is named metrics_CLUSTER_ID where CLUSTER_ID is from public.clusters.

To create a cluster, perform the following in the cbmon database:

SELECT public.create_cluster(
        v_name        varchar(256),  -- Cluster name
        v_mdwname     varchar(256),  -- Cluster mdw hostname
        v_mdwip       varchar(16),   -- Cluster mdw IP
        v_port        int,           -- Cluster mdw port
        v_cbmondb     varchar(256),  -- Database where cbmon schema has been loaded
        v_cbmonschema varchar(256),  -- Name of schema in database, typically cbmon
        v_user        varchar(256),  -- Superuser to connect as
        v_pass        varchar(256)   -- Superuser password
);

This function will create and populate metrics schema as well as several other tables.

IMPORTANT: Review public.cluster_hosts. Values in hostname column must match each host actual hostname, that is output of hostname command. For display purposes, populate public.cluster_hosts.display_name with preferred name such as sdw1, sdw2, and so on.

When changes are made to public.cluster_hosts, the clusters corresponding metrics_X.data_storage_summary_mv materialized view must be refreshed.

REFRESH MATERIALIZED VIEW metrics_X.data_storage_summary_mv WITH DATA;

Equally important, cluster metrics_X schema will not yet contain tables with historical data. This will occur once loader process has initially run.

Option 1 - Enabling old systemd loader process

Initially, all loading was done via a loader script. Early on it was noticed this script would fall behind due to its serial execution nature and certain load functions taking a long time. It is kept here simply to document as an option. The parallel loader process below is recommended as this will no longer be maintained.

Edit etc/config and cbmon_loader.service to reflect cluster ID
Install service file replacing <<CLUSTER_ID>> with cluster ID

sudo install -o root -g root -m 0644 \
     /usr/local/cbmon/etc/cbmon_loader.service \
     /etc/systemd/system/cbmon_loader-c<<CLUSTER_ID>>.service

Reload

sudo systemctl daemon-reload

Start, verify and enable replacing <<CLUSTER_ID>> with cluster ID

sudo systemctl start cbmon_loader-c<<CLUSTER_ID>>
sudo systemctl status cbmon_loader-c<<CLUSTER_ID>>
sudo systemctl enable cbmon_loader-c<<CLUSTER_ID>>

Option 2 - Enabling Message Driven Loader

Cloudberry clusters can be large with high core counts and with sysstat-collect.timer tuned running once/minute, files in /var/log/sa can become large. This can delay delivering metrics to cbmon database and ultimately grafana or whatever visualization tech is used.

Implementing a loader process capable of performing metrics load work in parallel is ncessary versus the original systemd loader running serially.

Unlike systemd method, parallel_loader is not configured to a specific cluster therefore services requests for any cluster.

Install RabbitMQ & configure adding users, vhost and queues. Example:

sudo -s

dnf install epel-release curl -y

curl -s https://packagecloud.io/install/repositories/rabbitmq/rabbitmq-server/script.rpm.sh | bash
curl -s https://packagecloud.io/install/repositories/rabbitmq/erlang/script.rpm.sh | bash

dnf -y install erlang rabbitmq-server

systemctl start rabbitmq-server
systemctl enable rabbitmq-server

rabbitmqctl add_user admin
rabbitmqctl set_user_tags admin administrator
rabbitmqctl list_users

# Enable web console
rabbitmq-plugins enable rabbitmq_management
systemctl restart rabbitmq-server
rabbitmqctl status

rabbitmqctl add_user cbmon
rabbitmqctl add_vhost /cbmon_ploader
rabbitmqctl set_permissions -p /cbmon_ploader cbmon ".*" ".*" ".*"

Install python3 modules

sudo pip3 install pika psycopg2

Configure etc/config.ini for RabbitMQ instance
Install parallel_loader.service in /etc/systemd/system

install -o root -g root -m 0644 /usr/local/cbmon/etc/parallel_loader.service /etc/systemd/system

Reload systemd, start and enable new service

sudo systemctl daemon-reload
sudo systemctl start parallel_loader.service
sudo systemctl enable parallel_loader.service

Create cron jobs for each cluster sending load messages for all enabled metric types

* * * * * PYTHONPATH=/usr/local/cbmon/bin/pylib /usr/local/cbmon/bin/send_loader_request --config /usr/local/cbmon/etc/config.ini -C <<CLUSTER_ID>> --load-all

Recommendations:

File etc/config.ini cbmon_load.max_workers should not exceed the number of connections wanted in Cloudberry database. Also keep in mind if Grafana or whatever dashboard & alerting implemented querying PostgreSQL cbmon database will also consume connections.

To alleviate connection consumption, recommend creating a pgbouncer instance or creating a pool in an existing pgbouncer. Once set up, ALTER SERVER setting host, port and any other relevant options to use pgbouncer.

When using any connection pooler, important for pool size to be equal to config.ini ``cbmon_load.max_workers``` plus adequate spares for dashboard and alerting.

Enabling summaries process

NOTE: If alter-1064.sql is loaded, this process should not be used. Summaries process will eventually be deprecated.

Summaries service execute functions responsible for generating periodic summarization of gathered performance metrics. It is useful for faster dashboard loads and permitting drill-down capabilities.

Edit etc/config and cbmon_summaries.service to reflect cluster ID
Install service file replacing <<CLUSTER_ID>> with cluster ID

sudo install -o root -g root -m 0644 \
     /usr/local/cbmon/etc/cbmon_summaries.service \
     /etc/systemd/system/cbmon_summaries-c<<CLUSTER_ID>>.service

Reload

sudo systemctl daemon-reload

Start, verify and enable replacing <<CLUSTER_ID>> with cluster ID

sudo systemctl start cbmon_summaries-c<<CLUSTER_ID>>
sudo systemctl status cbmon_summaries-c<<CLUSTER_ID>>
sudo systemctl enable cbmon_summaries-c<<CLUSTER_ID>>

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
doc		doc
grafana		grafana
pkg		pkg
src		src
COPYRIGHT		COPYRIGHT
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
TODO.md		TODO.md
cbmon-arch.png		cbmon-arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About:

Working Configurations:

Features:

Host Configuration:

Cloudberry Configuration:

PostgreSQL Metrics Database Host Only

Creating clusters

Option 1 - Enabling old systemd loader process

Option 2 - Enabling Message Driven Loader

Recommendations:

Enabling summaries process

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

gspiegelberg/cloudberry-mon

Folders and files

Latest commit

History

Repository files navigation

About:

Working Configurations:

Features:

Host Configuration:

Cloudberry Configuration:

PostgreSQL Metrics Database Host Only

Creating clusters

Option 1 - Enabling old systemd loader process

Option 2 - Enabling Message Driven Loader

Recommendations:

Enabling summaries process

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages