Error handling: partial processing and error interceptors #1733
Description
Hi,
as mentioned once on a call with @vladoschreiner and @jerrinot, would be great to have some kind of processing interceptors.
Currently I can wrap all verticles around ProcessorWrapper, which can catch an Exception and eg. not rethrow, allowing job to not fail. However, it's hard to retry processor without reprocessing of errored item.
For example: my inbox has 10 elements. 4 of them gets successfully processed, 5th gets errored. I want to process 5 more incoming elements despite that one error.
In ProcessorWrapper I can execute getWrapped().process(ordinal, inbox)
, but:
- some processors just drain whole inbox into collection, like Write Map,
- some processors use
poll()
to get data, so after exception being thrown the element is already deleted from inbox, but some use combination ofpeek()
withremove()
, so after exception the errored element is still in the inbox - it's hard to tell which element was processed and which not.
There should be some easy way to deal with broken records (without splitting pipeline into 2 branches - for many ETL pipelines it will mean branch on each step) and with partial processing (process as many as possible; sometimes customer wants to fail job only if some percent of data is broken).