-
Notifications
You must be signed in to change notification settings - Fork 304
Description
The Python SDK can currently surface generic or low-context errors when encountering
common failure scenarios such as server unavailability, timeouts, or partial workflow
registration failures. In both local development and production environments, this
can make it difficult for users to quickly determine whether an issue is caused by
connectivity, configuration problems, or transient infrastructure issues.
I’d like to improve the reliability and debuggability of the Python SDK by making
these failure modes more explicit and easier to reason about. This would include
centralizing request error handling, introducing clearer and more actionable
exceptions for common scenarios, adding configurable timeout and retry behavior
where appropriate, and improving healthcheck robustness while preserving backward
compatibility with existing SDK usage.
While documenting common failure cases is an alternative, improving behavior directly
in the SDK provides better defaults and faster feedback for users running Hatchet
workflows in production. I’m happy to approach this incrementally and align on scope
before implementation, and wanted to confirm that this is an area where contributions
and potential longer-term ownership would be welcome.