Handle watemarks: we don't want the main information of the document to be parasitized by watermarks, so we need an OCR able to distinguish and filter watermarks.

We could work on word occurrences and find areas covered by highly-repeated words, we could also work on relative contrasts.
Handle watemarks: we don't want the main information of the document to be parasitized by watermarks, so we need an OCR able to distinguish and filter watermarks.
We could work on word occurrences and find areas covered by highly-repeated words, we could also work on relative contrasts.