-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Labels
Dapr-Agents-1.0P2documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Extract and isolate resiliency logic (retry/backoff, error handling, etc.) from the linked issue: dapr/dapr-agents#167
Evaluate where resiliency should be applied in our DurableAgent and orchestrator workflows, and define a smart resiliency policy (i.e. one that knows when not to retry based on error type, not just blanket backoffs).
This applies to:
- DurableAgent workflow activities
- Orchestrator workflow activities
- External calls, especially LLM provider integrations
- Other potential areas where resiliency might be required
We need a "smart" resiliency policy that:
- Applies retry/backoff logic for transient failures (e.g. network timeouts, intermittent service outages)
- Does not retry for non-transient failures (e.g. invalid credentials, “out of credits”, malformed request)
- Logs or surfaces the classification of errors so it’s clear when resiliency kicked in vs when we aborted due to non-recoverable error
Acceptance Criteria
- Identify and specify where resiliency should be added in the DurableAgent workflow activities.
- Identify and specify where resiliency should be added in the orchestrator workflow activities.
- Define and bring in the concept of smart resiliency for external calls such as LLM providers—i.e. classify errors, decide when to retry vs abort.
Metadata
Metadata
Assignees
Labels
Dapr-Agents-1.0P2documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Type
Projects
Status
No status