Dear @aamend @alexott @nfx,
I appreaciate your work on making tika file format possible.
After reviewing serialiser code I have noticed you storing binary file as one of the columns.
Such a construct does not allow stable flow at a scale of more than 1000 large documents.
It could be prudent to store binary files outside of result dataframe.
Let me know your thoughts.