-
Notifications
You must be signed in to change notification settings - Fork 54
feat: add ability to configure tes error cache expire time #1751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Binary incompatibility detected for commit 4698da7. com.aws.greengrass.tes.CredentialRequestHandler is binary incompatible and is source incompatible because of FIELD_REMOVED Produced by binaryCompatability.py |
|
Unit Tests Coverage Report
Minimum allowed coverage is Generated by 🐒 cobertura-action against 4698da7 |
|
Integration Tests Coverage Report
Minimum allowed coverage is Generated by 🐒 cobertura-action against 4698da7 |
src/main/java/com/aws/greengrass/tes/CredentialRequestHandler.java
Outdated
Show resolved
Hide resolved
… avoid multiple restarts
7b3fd9b to
732599d
Compare
| if (node != null && (node.childOf(PORT_TOPIC) | ||
| || node.childOf(CLOUD_4XX_ERROR_CACHE_TOPIC) | ||
| || node.childOf(CLOUD_5XX_ERROR_CACHE_TOPIC) | ||
| || node.childOf(UNKNOWN_ERROR_CACHE_TOPIC))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We don't need to execute this callback if the what happened events are irrelevant to this. For eg, see how we don't take an action if the what happened event doesn't show a change in value.
|
We had an offline discussion about if the component should enter errored state when configuration is not valid, instead of printing an error but proceeding to succeed operation by using the minimum value rather than the configured value. We agreed it is preferable to enter errored state. So that the component provides feedback to greengrass about the configuration not being valid, let greengrass decide how to proceed, and provide immediate feedback to the customer about the invalid configuration. Currently there is only one way for TES config to be updated, that is through deployments. In this case greengrass can fail the deployment and roll back to a previously working state if configured to do so. In the future if we introduce the ability for TES config to be updated outside of a deployment (i.e. during runtime) (e.g. by calling UpdateConfiguration for TES from some custom component), then it seems like entering errored state would cause the device to become unhealthy and not auto-recover at that time. This problem is left to be addressed in the future. |
|
|
||
| private Instant getExpiryPolicyForErr(int statusCode) { | ||
| int expiryTime = UNKNOWN_ERROR_CACHE_IN_MIN; // In case of unrecognized cloud errors, back off | ||
| int expiryTime = unknownErrorCacheInSec; // In case of unrecognized cloud errors, back off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it was already like this, but expiryDuration is more appropriate than expiryTime
| logger.atInfo("tes-config-change") | ||
| .kv("node", node).kv("why", why) | ||
| .log("Restarting TES server due to config change"); | ||
| requestRestart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should be able to change the error cache time during runtime and without restart. I don't see why not. This would change the subscription handler logic to only request restart if the port changes. This would be useful because TES restarts cause a lot of dependent component restarts, which we want to avoid if possible.
Issue #, if available:
Description of changes:
Add ability for users to configure TES error cache expire time
Why is this change necessary:
How was this change tested:
Any additional information or context required to review the change:
Documentation Checklist:
Compatibility Checklist:
any deprecated method or type.
Refer to Compatibility Guidelines for more information.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.