-
Notifications
You must be signed in to change notification settings - Fork 14
Create TelemetryWdmTrait definition #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Create TelemetryWdmTrait definition #15
Conversation
@pidarped adding to this review ( unable to add as reviewer since I am not owner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would very much request we fix the clerical issues -- copyright etc.
I have questions about the design and suggestions for how to improve it. If the design proposed makes sense to you, I would very much like to see it changed. If this was something that was already considered and rejected, please post a rebuttal as to why the "last failure" is a preferred way to log the event.
@@ -0,0 +1,94 @@ | |||
// | |||
// Copyright (c) 2016 Nest Labs, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the copyright header: 2020, Google LLC, Apache license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// This field holds the count of successful updates since the | ||
// last restart. | ||
uint32 successful_updates_count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these metrics aggregated across different subscriptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is accumulated across multiple subscriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this looks like an aggregate of all Updates(presumably unconditional and conditional). It might be good to separate conditional vs unconditional updates also?
|
||
|
||
// This field holds the error code of the last update failure. | ||
int32 last_update_failure_reason = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the last failure update, it would also make some sense to note the address of the subscriber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
}; | ||
|
||
// Events associated with WDM updates | ||
message TelemetryWdmUpdateStatsEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to log the update and subscription failures as first class events rather than remembering the last event in a different event? The semantics of most of the field in this trait make quite a lot of sense as properties. As events I would suggest we expose 3 different events: subscription failure (fields: reason), update failure (fields: reason, profile, status_code), and counts (fields: successful_updates, unsuccessful_updates, successful_subscriptions, subscription_attempts). If desired, the last subscription failure and last update failure could also be exposed as properties, so they are available for inspection on connectivity recovery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the last failures would better serve as trait properties. That makes sense especially if these events are not uploaded periodically and visibility of the last failures might fade unless preserved as properties.
If we break these into multiple events, then the subscription and update failures could be triggered on the edges while the counter is a periodic one with a relatively long cadence(~20-30 mins)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to what was discussed offline , update on edge events for failure with properties in case the events are dropped lost due to lack of buffer space.
Updated latest version based on review |
WDM telemetry trait