-
Notifications
You must be signed in to change notification settings - Fork 14
proposal: Support secret providers #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress thanks!
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
LGTM. can you just update the description of the pull request for posteriority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally! Just last nit noted by @rajagopalanand and good to go IMO!
Approving from my side, but I will ask around for a second Prometheus maintainer to have another look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty solid :) I just had some small questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing feedback and discussion in https://github.com/prometheus/proposals/pull/47/files#r2041335383
Should we update the proposal with the mentioned alternatives considered?
What do you think is the reasonable solution from this discussion? I see the potential for changing proposal slightly to support the following (for consistency with SD):
password: "<inlined secret>"
password:
kubernetes:
namespace: "<ns>"
name: "<secret name>"
key: "<data's key for secret name>"
password:
file:
path: "<path to secret file>"
I vote for changing the proposal to be consistent with SD |
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
535a36e
to
9891e9c
Compare
d726925
to
b918d59
Compare
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
b918d59
to
ec70387
Compare
Signed-off-by: Henrique Spanoudis Matulis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Excited to see this in action. :)
I changed the config style to be more consistent with SD in the proposal. Thanks everyone for all the feedback, and feel free to reach out on the prometheus slack if you have any feedback or ideas. I am working on an implementation for the proposal:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @dgl @machine424 -- are you ok to merge? 🤗
## Remote Secrets (Secret Providers) | ||
|
||
* **Owners:** | ||
* Henrique Matulis (@hsmatulisgoogle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: username doesn't match (anymore? I think it was changed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one last question :)
|
||
## Action Plan | ||
|
||
* [ ] Create action plan after doc is stable! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to work on this before or after merging? :)
For example when specifying a password fetched from the kubernetes provider with an id of `pass2` in namespace `ns1` for the HTTP passsword field it would look like this: | ||
|
||
``` | ||
password: | ||
kubernetes: | ||
namespace: <ns> | ||
name: <secret name> | ||
key: <data's key for secret name> | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example is not matching the statement.
Did you mean:
For example when specifying a password fetched from the kubernetes provider with an id of `pass2` in namespace `ns1` for the HTTP passsword field it would look like this: | |
``` | |
password: | |
kubernetes: | |
namespace: <ns> | |
name: <secret name> | |
key: <data's key for secret name> | |
``` | |
For example when specifying a password fetched from the kubernetes provider with an id of `pass2` in namespace `ns1` for the HTTP passsword field it would look like this: | |
password:
kubernetes:
namespace: ns1
name: pass2
key: http_password
|
||
The second case is that we already have a secret value, but refreshing it has resulted in an error. In this case we should keep the component that uses this secret running with the potentially stale secret, and schedule a retry. | ||
|
||
If we are starting up prometheus and do not get any secret values, startup will continue and prometheus will continue to run and retry finding secrets. If there are components that are fully specified, they will run during this time. The idea is that it is better to send partial metrics than no metrics, and the emitted metrics for secrets can alert users if there is a problem with their service providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are starting up prometheus and do not get any secret values, startup will continue and prometheus will continue to run and retry finding secrets. If there are components that are fully specified, they will run during this time. The idea is that it is better to send partial metrics than no metrics, and the emitted metrics for secrets can alert users if there is a problem with their service providers. | |
If we are starting up Prometheus and do not get any secret values, startup will continue and Prometheus will continue to run and retry finding secrets. If there are components that are fully specified, they will run during this time. The idea is that it is better to send partial metrics than no metrics, and the emitted metrics for secrets can alert users if there is a problem with their service providers. |
# ... other config ... | ||
``` | ||
* **Caching Behavior on Failure:** If a scheduled background refresh attempt fails (e.g., due to network issues, temporary provider unavailability, invalid credentials after rotation), Prometheus will continue to use the last successfully fetched secret value. This ensures components relying on the secret can continue operating with the last known good credential, preventing outages due to transient refresh problems. Failed refresh attempts will be logged, and the `prometheus_remote_secret_state` metric will reflect the 'stale' state. | ||
* **Permission Errors:** If a component using a cached secret reports a specific permission or authentication error, this might indicate the cached secret is no longer valid. Hence, an immediate refresh attempt for that specific secret is triggered, bypassing the regular refresh interval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the implications of this statement. Isn't this requirement for secrets providers leaking into error handling strategy of all components that use a secrets provider?
We only know that a credential is expired/revoked after we try to use it. My point is that all components that use secrets provider will need to implement error handling in a specific way or will the secret provider provide some functionality that components should use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we take PagerDuty or Slack notifier as an example, is the expectation those integrations/receiver implementations should interpret AuthN errors returned and somehow notify the secret provider to refresh before retrying?
prometheus_remote_secret_last_successful_fetch_seconds{provider="kubernetes", secret_id="pass1"} 15 | ||
``` | ||
|
||
A state enum describing in which error condition the secret is in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the state, maybe it would be useful to also expose some counters to show events. Why? because that allow us to identify precisely what is happening across multiple scrapes, network failures etc. Something like:
prometheus_remote_secret_fetch_failures_total
prometheus_remote_secret_fetch_success_total
* **Not Querying Per-Scrape:** It is explicitly **not** the default behavior to query the secret provider on every use. | ||
|
||
|
||
### Secret rotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any example of backend that would benefit from the behaviours defined in this section.
Any example of backend that does not support secret overlapping?
This PR adds a proposal to support specifying secrets through secret providers.
TL;DR from doc