Open
Description
Hello,
I'm trying to get individual character confidences for each character on a page. Long story short, I need each word on a page, but I'd like to be able to mark characters with a specifically low confidence. I'm reading each character, and building collections based on a custom class.
My issue is that every character within a word is returning the same confidence value. It's a new value for each word, almost as if it is simply returning the averaged word confidence.
Thank you in advance for your assistance.
My vb.net code is below:
Private Sub PageToICRWords(page As Tesseract.Page)
Dim IT As Tesseract.ResultIterator = page.GetIterator
Dim Chars As New ICRCharacterCollection
Dim CurrentRow As Integer = -1
Do
Do
Do
Do
Do
Try
If IT.IsAtBeginningOf(PageIteratorLevel.TextLine) Then CurrentRow += 1
If IT.IsAtBeginningOf(PageIteratorLevel.Symbol) Then
Dim MyChar As New ICRCharacter
MyChar.Value = IT.GetText(PageIteratorLevel.Symbol)
If MyChar.Value > "" Then
MyChar.Confidence = IT.GetConfidence(PageIteratorLevel.Symbol)
Dim TesRec As New Rect
IT.TryGetBoundingBox(PageIteratorLevel.Symbol, TesRec)
MyChar.Box = New Rectangle(TesRec.X1, TesRec.Y1, TesRec.Width, TesRec.Height)
Chars.Add(MyChar)
End If
End If
If IT.IsAtFinalOf(PageIteratorLevel.Word, PageIteratorLevel.Symbol) Then
If Not Chars Is Nothing AndAlso Chars.Count > 0 Then
Me.Add(New ICRWord(Chars, CurrentRow))
End If
Chars = New ICRCharacterCollection
End If
Catch ex As Exception
LogErrorLocal("PageToICRWords", ex)
End Try
Loop While IT.Next(PageIteratorLevel.Symbol)
Loop While IT.Next(PageIteratorLevel.Word)
Loop While IT.Next(PageIteratorLevel.TextLine)
Loop While IT.Next(PageIteratorLevel.Para)
Loop While IT.Next(PageIteratorLevel.Block)
End Sub
Activity