Skip to content

quality in very low contrast regime #38

@bertsky

Description

@bertsky

I have material with typewritten forms that is very challenging (to any binarization method), because the typewriter sometimes fades out, while the printing ink near it blasts in a dark black. The scan/photography also seems to cause a non-normalized histogram:

  • original
    OCR-D-IMG_Ansiedlung_Korotschin_UZS_Sign_22a_0000
  • default-2021-03-09
    OCR-D-BIN_Ansiedlung_Korotschin_UZS_Sign_22a_0000 IMG-BIN
  • (after contrast normalization)
    OCR-D-BIN_Ansiedlung_Korotschin_UZS_Sign_22a_0000 IMG-BIN
  • (after +20% brightness)
    OCR-D-BIN_Ansiedlung_Korotschin_UZS_Sign_22a_0000 IMG-BIN
  • (after -30% brightness)
    OCR-D-BIN_Ansiedlung_Korotschin_UZS_Sign_22a_0000 IMG-BIN
  • Olena with Wolf's algorithm
    OCR-D-BIN-WOLF_Ansiedlung_Korotschin_UZS_Sign_22a_0000-BIN_wolf

So it seems that the autoencoder gets confused by the normalized image, but benefits from making the image even darker. May that be a general tendency (as in: if you loose fg, make it darker, and conversely if you get bg, make it brighter)? Can we derive any metrics that might hint at quality problems from the intermediate activation between encoder and decoder? Any recommendations/considerations?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions