Replies: 2 comments
-
I wonder if we should publish our sentry information (like connection info, etc.). |
Beta Was this translation helpful? Give feedback.
0 replies
-
If the goal is to get backtraces from failed API requests I think it should happen on the server-side of the API not on the client-side. Reasons:
So really, tl;dr, if the goal is getting good traces on failed API requests, it is better done on server-side where requests are actually processed rather than client libraries or applications. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
To improve stability of our services, we need to identify issues and bottlenecks first.
In the past, when encountering issues in API reliability that were temporary in nature, we have implemented retry mechanisms into gsclient-go and the gridscale-terraform-plugin where applicable to improve the stability of the overall process. Also, underlying issues have been identified and, where possible, solved accordingly.
These retry mechanisms have the side-effect that, once in place, they mask underyling issues. To improve quality for everyone, we need to make them visible again.
This can be tackled from either the client or the server-side. For the latter, the APIs would provide metrics of request success. This does not make all issues visible though. In front of the actual APIs, there is an application server. In front of that, a web server. In front of that a reverse proxy. In front of that a firewall. In front of that a router. Etc. Issues in these layers, or the rolling deployment of the APIs itself, would not be visible in API-level metrics. Instead, we want our SDKs to report errors to us - with an opt-out functionality.
gsclient-go
has been selected as the first prototype - mainly, because it is used by terraform as well. Terraform, in return, makes use of higher concurrency than many other clients do and tends to trigger deficiencies in provisioning more efficiently.The idea is to send these reports to a publically available sentry instance. Networking errors (connection reset f.e.) and provisioning errors (Storage creation error f.e.) alike. To investigate issues, I presume we need
X-Request-Id
, if available/objects/servers
) for aggregationGET
,POST
,PATCH
,DELETE
, etc) for aggregationThese reports shall also be sent when the requests is retried. They shall not block the SDK and, as such, shall send reports asynchronously. Users must be able to opt-out.
What are your thoughts?
Beta Was this translation helpful? Give feedback.
All reactions