-
-
Notifications
You must be signed in to change notification settings - Fork 236
feat: Add SMART attribute override functionality #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This feature allows users to override SMART attribute behavior at the host level. Users can now: - Ignore specific attributes entirely. - Force a specific status (passed, warn, failed) for an attribute. - Set custom thresholds (warn_above, fail_above) for attribute values. This is configured via a new section in the scrutiny.yaml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces SMART attribute override functionality to allow users to customize how specific SMART attributes are evaluated. The feature enables ignoring problematic attributes, forcing specific statuses, and setting custom thresholds to address false positives from vendor-specific attributes.
Key changes:
- New
AttributeOverridemodel with support for ignore, force_status, and threshold-based overrides - Integration of override logic into the SMART data processing pipeline with device status recalculation
- Comprehensive unit test coverage for override scenarios
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| webapp/backend/pkg/models/attribute_override.go | Introduces the AttributeOverride model defining the structure for user-configured attribute overrides |
| webapp/backend/pkg/database/scrutiny_repository_device_smart_attributes.go | Implements core override logic including loading, matching, and applying overrides to SMART attributes |
| webapp/backend/pkg/database/scrutiny_repository_device_smart_attributes_test.go | Adds comprehensive unit tests covering various override scenarios (ignore, force_status, thresholds, mismatches) |
| webapp/backend/pkg/database/scrutiny_repository_device.go | Updates device status persistence to always overwrite (not merge) to properly clear old failure flags |
| webapp/backend/pkg/web/handler/upload_device_metrics.go | Changes status update logic to always persist evaluated status, ensuring old failures are cleared when resolved |
| webapp/backend/pkg/config/config.go | Adds default configuration for smart.attribute_overrides |
| webapp/backend/pkg/web/server_test.go | Adds mock expectations for new InfluxDB initialization config keys to fix test failures |
| example.scrutiny.yaml | Adds documentation and examples for the new attribute override configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| //////////////////////////////////////////////////////////////////////////////////////////////////////////////////// | ||
| // ////////////////////////////////////////////////////////////////////////////////////////////////////////////////// | ||
| func (sr *scrutinyRepository) SaveSmartAttributes(ctx context.Context, wwn string, collectorSmartData collector.SmartInfo) (measurements.Smart, error) { | ||
| sr.logger.Infof("SaveSmartAttributes called for wwn=%s", wwn) |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The Infof log at this line will log on every SMART attribute save operation, which could be quite frequent in production. Consider using Debugf instead for this type of operational logging to avoid log verbosity.
| sr.logger.Infof("SaveSmartAttributes called for wwn=%s", wwn) | |
| sr.logger.Debugf("SaveSmartAttributes called for wwn=%s", wwn) |
| overrides = append(overrides, ao) | ||
| } | ||
|
|
||
| sr.logger.Infof("Loaded %d attribute overrides from config", len(overrides)) |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This Infof log is called on every SMART data upload and will execute the loop iteration even when there are no overrides configured (most common case). The logging should be moved inside the if len(overrides) == 0 check or reduced to Debugf level to avoid excessive logging in production.
| sr.logger.Infof("Loaded %d attribute overrides from config", len(overrides)) | |
| sr.logger.Debugf("Loaded %d attribute overrides from config", len(overrides)) |
| for attrKey, attrData := range smart.Attributes { | ||
| override := sr.matchingOverride(smart.DeviceProtocol, wwn, attrKey, overrides) | ||
| if override != nil { | ||
| sr.logger.Infof("Applying override to attribute %s: action=%s", attrKey, override.Action) |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This Infof log is inside a loop over all attributes and will generate a log entry for every matched override on every SMART upload. Consider using Debugf instead to reduce log verbosity in production.
| sr.logger.Infof("Applying override to attribute %s: action=%s", attrKey, override.Action) | |
| sr.logger.Debugf("Applying override to attribute %s: action=%s", attrKey, override.Action) |
| // apply host-level attribute overrides before persisting or notifying | ||
| attributeOverrides := sr.loadAttributeOverrides() | ||
| sr.applyAttributeOverrides(&deviceSmartData, wwn, attributeOverrides) |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loadAttributeOverrides() function is called on every SMART data save operation, causing repeated configuration parsing and map allocation. Consider caching the parsed overrides (e.g., loading once during repository initialization or using a cached value with a reload mechanism) to improve performance, especially for systems with many devices reporting frequently.
| } | ||
|
|
||
| sr.logger.Debugf(" MATCH!") | ||
| return &o |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a pointer to the loop variable o can lead to unexpected behavior. The loop variable is reused in each iteration, so all returned pointers could point to the same memory location with the last iteration's value. Copy the value or return by index: return &overrides[ndx] instead of return &o.
| return &o | |
| return &overrides[ndx] |
| case "ignore": | ||
| return setAttributeStatus(attr, pkg.AttributeStatusPassed, "Ignored by attribute override") |
Copilot
AI
Dec 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "ignore" action is handled twice: once at the beginning (lines 305-307) and again in the switch statement (lines 314-315). The first check makes the switch case unreachable. Remove the duplicate case from the switch statement.
| case "ignore": | |
| return setAttributeStatus(attr, pkg.AttributeStatusPassed, "Ignored by attribute override") |
This PR introduces a new 'attribute_overrides' section to scrutiny.yaml. It allows users to:
This addresses issues like #522, #332, and #369 where vendor-specific attributes or specific scenarios cause incorrect failure reports or prevent custom thresholding. Partially I was tired of an error from early age of a disk showing the entire disk as FAILED despite perfect repeated badblocks testing. Note, I did code portions of this and used help from an LLM for other portions. I did also test this fairly significantly - but would appreciate testing from other users if possible.
This also includes unit tests for the new logic.