Description
What happened?
Consider a pipeline where two steps A and B are fused and A -> B.
Runners such as Dataflow or the Fnapi harness wire up output(...)
calls from A directly to processElement(...)
calls to B. If B throws an exception during processing this exception then is just propagated up the stack and inside of A. If A doesn't catch this exception, it is caught at the runner level and results in a failed bundle.
However by raising the exception to A possible issues arise:
- the exception may interrupt A processing unexpectedly leaving it in some invalid state if it had bad error handling. Example could be not decrementing some static semaphore held in a final block.
- be caught within A if A itself has a try catch block for error handling. However the user may not realize that they are catching and supressing errors from B with this handling and it could lead to data loss as the processing of the input to B didn't complete.
Similar issues can occur if B itself outputs to a sink (such as fnapi data output or dataflow commit) which encounters an encoding error.
It seems like it would be preferable to catch such errors between DoFn boundaries instead of propagating them directly. Once detected, the process of the bundle could be cleanly stopped and the exceptions could be surfaced as failures to process the bundle.
This issue has been observed within Java execution on Dataflow but appears to be a common problem for the Java SDK and likely other sdks in which exceptions are thrown.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner