Skip to content

Character confidence is the same for the whole word. #207

Open
@fhbiii

Description

Hello,

I'm trying to get individual character confidences for each character on a page. Long story short, I need each word on a page, but I'd like to be able to mark characters with a specifically low confidence. I'm reading each character, and building collections based on a custom class.

My issue is that every character within a word is returning the same confidence value. It's a new value for each word, almost as if it is simply returning the averaged word confidence.

Thank you in advance for your assistance.
My vb.net code is below:

 Private Sub PageToICRWords(page As Tesseract.Page)   
    Dim IT As Tesseract.ResultIterator = page.GetIterator
    Dim Chars As New ICRCharacterCollection
    Dim CurrentRow As Integer = -1

    Do
        Do
            Do
                Do
                    Do
                        Try
                            If IT.IsAtBeginningOf(PageIteratorLevel.TextLine) Then CurrentRow += 1
                            If IT.IsAtBeginningOf(PageIteratorLevel.Symbol) Then
                                Dim MyChar As New ICRCharacter
                                MyChar.Value = IT.GetText(PageIteratorLevel.Symbol)
                                If MyChar.Value > "" Then
                                    MyChar.Confidence = IT.GetConfidence(PageIteratorLevel.Symbol)
                                    Dim TesRec As New Rect
                                    IT.TryGetBoundingBox(PageIteratorLevel.Symbol, TesRec)
                                    MyChar.Box = New Rectangle(TesRec.X1, TesRec.Y1, TesRec.Width, TesRec.Height)
                                    Chars.Add(MyChar)
                                End If
                            End If
                            If IT.IsAtFinalOf(PageIteratorLevel.Word, PageIteratorLevel.Symbol) Then
                                If Not Chars Is Nothing AndAlso Chars.Count > 0 Then
                                    Me.Add(New ICRWord(Chars, CurrentRow))
                                End If
                                Chars = New ICRCharacterCollection
                            End If
                        Catch ex As Exception
                            LogErrorLocal("PageToICRWords", ex)
                        End Try
                    Loop While IT.Next(PageIteratorLevel.Symbol)
                Loop While IT.Next(PageIteratorLevel.Word)
            Loop While IT.Next(PageIteratorLevel.TextLine)
        Loop While IT.Next(PageIteratorLevel.Para)
    Loop While IT.Next(PageIteratorLevel.Block)


End Sub

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions