Skip to content

Add evaluation benchmarks #43

@rth

Description

@rth

Thanks for creating this package!

As discussed in #14 it would be nice to add some evaluation benchmarks. And maybe optionally compare with tesseract or some other reference open source OCR.

What datasets were you considering?

There is for instance the SROIE dataset of scanned recipes. The dataset can be found here (couldn't find a more official source). In particular there are two task described in their paper,

  • Task 1 - Scanned Receipt Text Localisation. Though I didn't get how the evaluation works exactly after skimming their paper.
  • Task 2 - Scanned Receipt OCR. Computing precision, recall and F1 score for all words (space tokenized) extracted from the document, as far as I understand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions