BUG REPORT INFORMATION
may be related to #797
Version
rpm -qa | egrep 'centreon-(broker|web|gorgone)'
centreon-web-24.10.16-1.el8.noarch
centreon-broker-24.10.14-1.el8.x86_64
centreon-broker-cbd-24.10.14-1.el8.x86_64
centreon-broker-core-24.10.14-1.el8.x86_64
centreon-broker-cbmod-24.10.14-1.el8.x86_64
centreon-gorgone-24.10.9-1.el8.noarch
centreon-gorgone-centreon-config-24.10.9-1.el8.noarch
Operating System
AlmaLinux 8.10
Browser used
Version: 143.0.7499.170
Additional environment details (AWS, VirtualBox, physical, etc.):
VMWare VM with external Database Server (also AlmaLinux8)
Description
RRD files do not retain valid data points for services with check intervals longer than ~10 minutes, despite performance data being correctly stored in centreon_storage.data_bin and broker logs confirming successful RRD update attempts. This affects passive checks (e.g., Greenbone vulnerability scans, external data feeds) and any active checks with extended intervals.
Key finding: Manually modifying an RRD file's minimal_heartbeat from 600 seconds to a value matching 2-3× the actual check interval resolves the issue completely, confirming this parameter as the root cause.
Steps to Reproduce
- Create a passive or active service with check interval of 60 minutes that returns performance data
- Wait for multiple check periods (≥3 hours) without manually triggering checks
- Verify database contains perfdata:
SELECT * FROM centreon_storage.data_bin WHERE id_metric = <metric_id>;
- Examine RRD file:
rrdtool info /var/lib/centreon/metrics/<metric_id>.rrd
- Check for valid data:
rrdtool dump /var/lib/centreon/metrics/<metric_id>.rrd | grep -v NaN
- View service graph in Centreon web interface
Describe the received result
RRD File Analysis (60-min check interval)
Metric ID 761038 (default broker configuration):
[root@monitoring ~]# rrdtool info /var/lib/centreon/metrics/761038.rrd
filename = "/var/lib/centreon/metrics/761038.rrd"
rrd_version = "0003"
step = 1
last_update = 1767717430
ds[value].type = "GAUGE"
ds[value].minimal_heartbeat = 600 # Only 10 minutes tolerance
ds[value].last_ds = "2.000000"
ds[value].value = NaN # No valid current value
rra[0].cf = "AVERAGE"
rra[0].pdp_per_row = 60
rra[0].cdp_prep[0].unknown_datapoints = 10
rra[1].cf = "AVERAGE"
rra[1].pdp_per_row = 3600
rra[1].cdp_prep[0].unknown_datapoints = 2230 # Nearly all data unknown
Data dump showing minimal valid data:
[root@monitoring ~]# rrdtool dump /var/lib/centreon/metrics/761038.rrd | grep -v NaN
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
<rrd>
<version>0003</version>
<step>1</step>
<lastupdate>1767717430</lastupdate> <!-- 2026-01-06 17:37:10 CET -->
<ds>
<name> value </name>
<type> GAUGE </type>
<minimal_heartbeat>600</minimal_heartbeat>
<last_ds>2.000000</last_ds>
<unknown_sec> 0 </unknown_sec>
</ds>
<rra>
<cf>AVERAGE</cf>
<pdp_per_row>60</pdp_per_row>
<params><xff>5.0000000000e-01</xff></params>
<cdp_prep>
<ds>
<value>0.0000000000e+00</value>
<unknown_datapoints>10</unknown_datapoints>
</ds>
</cdp_prep>
<database>
<!-- Only 6 consecutive minutes have data -->
<!-- 2026-01-06 13:57:00 --> <row><v>2.0000000000e+00</v></row>
<!-- 2026-01-06 13:58:00 --> <row><v>2.0000000000e+00</v></row>
<!-- 2026-01-06 13:59:00 --> <row><v>2.0000000000e+00</v></row>
<!-- 2026-01-06 14:00:00 --> <row><v>2.0000000000e+00</v></row>
<!-- 2026-01-06 14:01:00 --> <row><v>2.0000000000e+00</v></row>
<!-- 2026-01-06 14:02:00 --> <row><v>2.0000000000e+00</v></row>
</database>
</rra>
</rrd>
Broker Log
The Broker Log (with post translated metric ids to names) confirms update attempts:
[2026-01-06T16:47:14.743+01:00] [rrd] [debug] RRD: new pb data for LR44AP006::4_Misc:_Greenbone-Security-Status::medium(761007) (time 1767713834)
[2026-01-06T16:47:14.743+01:00] [rrd] [debug] RRD: updating file '/var/lib/centreon/metrics/761007.rrd' (1767713834:0.000000) [LR44AP006::4_Misc:_Greenbone-Security-Status::medium]
Updates are logged successfully but data is not retained in the RRD.
Describe the expected result
RRD files should contain valid consolidated data points for each check interval, displaying continuous historical graphs matching the service's check frequency (e.g., hourly data points for 60-minute checks).
Workaround / Verification
Manually recreating the RRD file with increased minimal_heartbeat resolves the issue:
Metric ID 761039 (manually created with rrdtool create using heartbeat=36000):
[root@monitoring ~]# rrdtool info /var/lib/centreon/metrics/761039.rrd
filename = "/var/lib/centreon/metrics/761039.rrd"
rrd_version = "0003"
step = 1
last_update = 1767717430
ds[value].type = "GAUGE"
ds[value].minimal_heartbeat = 36000 # 10 hours tolerance
ds[value].last_ds = "9.800000"
ds[value].value = 0.0000000000e+00 # Valid current value present
rra[0].cf = "AVERAGE"
rra[0].pdp_per_row = 3600
rra[0].cdp_prep[0].unknown_datapoints = 0 # All data marked as valid
Data dump shows continuous hourly values:
[root@monitoring ~]# rrdtool dump /var/lib/centreon/metrics/761039.rrd | grep -v NaN
<?xml version="1.0" encoding="utf-8"?>
<rrd>
<version>0003</version>
<step>1</step>
<lastupdate>1767717430</lastupdate>
<ds>
<name> value </name>
<type> GAUGE </type>
<minimal_heartbeat>36000</minimal_heartbeat>
<last_ds>9.800000</last_ds>
<value>0.0000000000e+00</value>
<unknown_sec> 0 </unknown_sec>
</ds>
<rra>
<cf>AVERAGE</cf>
<pdp_per_row>3600</pdp_per_row>
<cdp_prep>
<ds>
<primary_value>9.8000000000e+00</primary_value>
<secondary_value>9.8000000000e+00</secondary_value>
<value>2.1854000000e+04</value>
<unknown_datapoints>0</unknown_datapoints>
</ds>
</cdp_prep>
<database>
<!-- Continuous hourly data, no NaN gaps -->
<!-- 2026-01-06 15:00:00 --> <row><v>9.8000000000e+00</v></row>
<!-- 2026-01-06 16:00:00 --> <row><v>9.8000000000e+00</v></row>
<!-- 2026-01-06 17:00:00 --> <row><v>9.8000000000e+00</v></row>
</database>
</rra>
</rrd>
Result: Graphs display correctly with proper hourly resolution when heartbeat accommodates the check interval.
Additional relevant information
Affected Scope
- Use cases: Passive vulnerability scans (Greenbone, Nessus), custom monitoring scripts, external passive data feeds
- Frequency: Hundreds of affected metrics in production
- Workaround: Manual RRD recreation required per metric (impractical at scale)
Configuration
- Monitoring Engine Interval Length: 60 seconds (Administration > Parameters > Monitoring)
- Service check intervals: 60-120 minutes (passive checks)
- Both broker logs and database confirm correct data flow
Technical Context
The issue appears related to how the broker calculates minimal_heartbeat during RRD creation. Currently, RRDs are created with heartbeat=600 seconds regardless of the service's actual check interval. When checks run every 60+ minutes, rrdtool marks the gap between updates as "unknown," producing NaN values.
Possible locations requiring investigation:
- RRD creation logic in broker (determining heartbeat calculation)
- Metric event structure (whether check interval information is available)
- Configuration options for heartbeat multiplier
Important note: The global "Interval Length" (60s) should remain unchanged to preserve high-resolution graphs for frequent checks (e.g., 1-minute ping checks). The issue specifically affects services with intervals exceeding the current heartbeat / 2 threshold.
Is the current minimal_heartbeat behavior (appears to be fixed at 600s) intended to support only high-frequency checks?
Should the heartbeat calculation consider the service's actual check interval, or is there a configuration option we're missing?
Are there performance implications of using larger heartbeat values (e.g., 2-6 hours) that should be considered?
BUG REPORT INFORMATION
may be related to #797
Version
Operating System
AlmaLinux 8.10
Browser used
Version: 143.0.7499.170
Additional environment details (AWS, VirtualBox, physical, etc.):
VMWare VM with external Database Server (also AlmaLinux8)
Description
RRD files do not retain valid data points for services with check intervals longer than ~10 minutes, despite performance data being correctly stored in
centreon_storage.data_binand broker logs confirming successful RRD update attempts. This affects passive checks (e.g., Greenbone vulnerability scans, external data feeds) and any active checks with extended intervals.Key finding: Manually modifying an RRD file's
minimal_heartbeatfrom 600 seconds to a value matching 2-3× the actual check interval resolves the issue completely, confirming this parameter as the root cause.Steps to Reproduce
SELECT * FROM centreon_storage.data_bin WHERE id_metric = <metric_id>;rrdtool info /var/lib/centreon/metrics/<metric_id>.rrdrrdtool dump /var/lib/centreon/metrics/<metric_id>.rrd | grep -v NaNDescribe the received result
RRD File Analysis (60-min check interval)
Metric ID 761038 (default broker configuration):
Data dump showing minimal valid data:
Broker Log
The Broker Log (with post translated metric ids to names) confirms update attempts:
Updates are logged successfully but data is not retained in the RRD.
Describe the expected result
RRD files should contain valid consolidated data points for each check interval, displaying continuous historical graphs matching the service's check frequency (e.g., hourly data points for 60-minute checks).
Workaround / Verification
Manually recreating the RRD file with increased
minimal_heartbeatresolves the issue:Metric ID 761039 (manually created with
rrdtool createusing heartbeat=36000):Data dump shows continuous hourly values:
Result: Graphs display correctly with proper hourly resolution when heartbeat accommodates the check interval.
Additional relevant information
Affected Scope
Configuration
Technical Context
The issue appears related to how the broker calculates
minimal_heartbeatduring RRD creation. Currently, RRDs are created withheartbeat=600seconds regardless of the service's actual check interval. When checks run every 60+ minutes, rrdtool marks the gap between updates as "unknown," producing NaN values.Possible locations requiring investigation:
Important note: The global "Interval Length" (60s) should remain unchanged to preserve high-resolution graphs for frequent checks (e.g., 1-minute ping checks). The issue specifically affects services with intervals exceeding the current
heartbeat / 2threshold.Is the current
minimal_heartbeatbehavior (appears to be fixed at 600s) intended to support only high-frequency checks?Should the heartbeat calculation consider the service's actual check interval, or is there a configuration option we're missing?
Are there performance implications of using larger heartbeat values (e.g., 2-6 hours) that should be considered?