Hello,
Thank you for creating this module — it's been a great help as we migrated our monitoring to the Redfish API. I'd like to raise a compatibility issue we've hit in production and propose a path forward.
Summary
On Dell PowerEdge R7615 servers running the latest iDRAC firmware 7.30.10.50, the /redfish/v1/Chassis/System.Embedded.1/Thermal endpoint intermittently returns Base.1.12.GeneralError. Around 10 out of our 23 R7615s running the newest fw version are affected, generating a significant volume of false unknown-state alerts in Icinga. During these failures, all other endpoints respond normally, and the iDRAC web UI continues to display correct readings — so the issue is specific to the /Thermal endpoint itself. We have a Dell support case open, but since /Thermal has been deprecated since Redfish standard version 1.7 (2020), migrating check_redfish to the newer endpoints seems like the right long-term fix regardless.
What we tried
We run three PowerEdge generations (R7525, R7615, R7715) and built a local patch on top of v2.1.2 that replaces /Thermal with the /ThermalSubsystem family of endpoints. It has been validated across all three platforms:
- Fans — collected from /ThermalSubsystem/Fans/ using SpeedPercent.SpeedRPM, with SecondarySpeedPercent handled for dual-rotor fans on newer generations
- Temperatures — collected by following DataSourceUri links from /ThermalSubsystem/ThermalMetrics to individual /Sensors/ resources, which expose full threshold and health data (Thresholds.UpperCritical, UpperCaution, LowerCritical, LowerCaution, and Upper/LowerFatal)
The patch is internal and not ready for upstream as-is, but happy to share it if it would be a useful reference.
0001-Thermal-subsystem-endpoints-switch.patch
Best regards
Jiří
Hello,
Thank you for creating this module — it's been a great help as we migrated our monitoring to the Redfish API. I'd like to raise a compatibility issue we've hit in production and propose a path forward.
Summary
On Dell PowerEdge R7615 servers running the latest iDRAC firmware 7.30.10.50, the /redfish/v1/Chassis/System.Embedded.1/Thermal endpoint intermittently returns Base.1.12.GeneralError. Around 10 out of our 23 R7615s running the newest fw version are affected, generating a significant volume of false unknown-state alerts in Icinga. During these failures, all other endpoints respond normally, and the iDRAC web UI continues to display correct readings — so the issue is specific to the /Thermal endpoint itself. We have a Dell support case open, but since /Thermal has been deprecated since Redfish standard version 1.7 (2020), migrating check_redfish to the newer endpoints seems like the right long-term fix regardless.
What we tried
We run three PowerEdge generations (R7525, R7615, R7715) and built a local patch on top of v2.1.2 that replaces /Thermal with the /ThermalSubsystem family of endpoints. It has been validated across all three platforms:
The patch is internal and not ready for upstream as-is, but happy to share it if it would be a useful reference.
0001-Thermal-subsystem-endpoints-switch.patch
Best regards
Jiří