Skip to content

[Alerting][TaskManager] Skip UIAM conversion retries for org-membership failures#271929

Merged
ersin-erdal merged 2 commits into
elastic:mainfrom
ersin-erdal:alerting/skip-non-org-member-uiam-conversion-errors
May 29, 2026
Merged

[Alerting][TaskManager] Skip UIAM conversion retries for org-membership failures#271929
ersin-erdal merged 2 commits into
elastic:mainfrom
ersin-erdal:alerting/skip-non-org-member-uiam-conversion-errors

Conversation

@ersin-erdal
Copy link
Copy Markdown
Contributor

@ersin-erdal ersin-erdal commented May 29, 2026

Summary

Extends the UIAM API key provisioning task (rules and tasks) so it stops
retrying rules/tasks that hit a permanent UIAM conversion failure with
error code 0xBE2B58:

Internal server error: API key with ID [...] is not valid for conversion. ES API key creator [...] is not a member of organization [...]

It joins the existing permanent failure code
0x357391 ("ES API key creator is not a Cloud user").

What changed

  • New shared constant list PERMANENT_UIAM_CONVERSION_ERROR_CODES in both alerting/server/provisioning/constants.ts and task_manager/server/uiam_api_key_provisioning/constants.ts, grouping NON_CLOUD_USER_API_KEY_CREATOR_ERROR_CODE (0x357391) and the new API_KEY_CREATOR_NOT_ORG_MEMBER_ERROR_CODE (0xBE2B58).
  • getExcludeRulesFilter and getExcludeTasksFilter now OR over that list inside the FAILED branch, so any rule/task whose persisted uiam_api_keys_provisioning_status doc carries one of these codes is excluded from future provisioning attempts.
  • Unit tests parameterised over the list and a new assertion verifying the failed branch contains an OR of every known permanent code (so a future code added to the list will fail the test until covered).

The persistence side already writes the UIAM code onto the status doc via map_convert_response_to_result.ts, so no additional plumbing was needed.

To verify

  1. Start an unenrolled stack:
    yarn es serverless --projectType observability --uiam
    yarn start --serverless oblt --run-examples --no-uiam
    
    With UIAM disabled, example rules are created with ES-only API keys.
  2. In config/kibana.dev.yml set feature_flags.overrides.alerting.rules.provisionUiamApiKeys: true and restart Kibana with yarn start --serverless oblt --run-examples.
  3. Drive at least one rule into a permanent failure with each code (e.g. create the rule with an ES API key whose creator is not a member of the UIAM organization for 0xBE2B58; with a non-Cloud user for 0x357391).
  4. Inspect the provisioning status docs:
    GET /.kibana_alerting_cases_*/_search
    {
      "query": { "match": { "type": "uiam_api_keys_provisioning_status" } }
    }
    
    Confirm the failing rules have attributes.status: "failed" and attributes.errorCode set to 0x357391 or 0xBE2B58.
  5. Trigger another provisioning run and verify the failing rules are no longer attempted (no new attempts logged, status doc unchanged) while other failed rules without these codes continue to be retried.

Made with Cursor

…ip failures

Adds the UIAM convert error code `0xBE2B58` ("API key creator is not a
member of organization") to the list of permanent UIAM conversion
failures, alongside the existing `0x357391` ("API key creator is not a
Cloud user"). Rules and tasks whose `uiam_api_keys_provisioning_status`
SO carries any of these codes are excluded from future provisioning
attempts.

Introduces a shared `PERMANENT_UIAM_CONVERSION_ERROR_CODES` list so
new codes can be added in a single place. The exclusion KQL filter now
ORs over the list inside the `FAILED` branch.

Source for error codes:
https://github.com/elastic/uiam/blob/main/modules/domain/src/main/java/co/elastic/cloud/uiam/domain/errors/ErrorCode.java

Co-authored-by: Cursor <cursoragent@cursor.com>
@ersin-erdal ersin-erdal marked this pull request as ready for review May 29, 2026 14:16
@ersin-erdal ersin-erdal requested a review from a team as a code owner May 29, 2026 14:16
@ersin-erdal ersin-erdal added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// labels May 29, 2026
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

Pinging @elastic/response-ops (Team:ResponseOps)

@darnautov darnautov self-requested a review May 29, 2026 14:46
Comment on lines +42 to +46
export const API_KEY_CREATOR_NOT_ORG_MEMBER_ERROR_CODE = '0xBE2B58';
export const PERMANENT_UIAM_CONVERSION_ERROR_CODES: readonly string[] = [
NON_CLOUD_USER_API_KEY_CREATOR_ERROR_CODE,
API_KEY_CREATOR_NOT_ORG_MEMBER_ERROR_CODE,
];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a package @kbn/uiam-api-keys-provisioning-status, shall we consolidate these codes there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — done in ba5d286. The two codes plus PERMANENT_UIAM_CONVERSION_ERROR_CODES now live in @kbn/uiam-api-keys-provisioning-status alongside the status/entity enums; the alerting and task_manager plugins import them from there.

Moves NON_CLOUD_USER_API_KEY_CREATOR_ERROR_CODE,
API_KEY_CREATOR_NOT_ORG_MEMBER_ERROR_CODE and the
PERMANENT_UIAM_CONVERSION_ERROR_CODES list from per-plugin constants
into the shared @kbn/uiam-api-keys-provisioning-status package, so
alerting and task_manager consume them from a single source of truth.

Addresses review feedback on elastic#271929.

Co-authored-by: Cursor <cursoragent@cursor.com>
@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

@ersin-erdal ersin-erdal merged commit 7769304 into elastic:main May 29, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants