Add evaluation benchmarks

Thanks for creating this package!

As discussed in https://github.com/robertknight/ocrs/issues/14 it would be nice to add some evaluation benchmarks. And maybe optionally compare with tesseract or some other reference open source OCR.

What datasets were you considering?

There is for instance the [SROIE dataset of scanned recipes](https://paperswithcode.com/dataset/sroie). The dataset can be found [here](https://github.com/Losyash/SROIE-datasetv2) (couldn't find a more official source).  In particular there are two task described in [their paper](https://arxiv.org/pdf/2103.10213v1.pdf),
 - Task 1 - Scanned Receipt Text Localisation. Though I didn't get how the evaluation works exactly after skimming their paper.
 - Task 2 - Scanned Receipt OCR. Computing precision, recall and F1 score for all words (space tokenized) extracted from the document, as far as I understand.
       

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluation benchmarks #43

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add evaluation benchmarks #43

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions