Skip to content

Capitalized Cyrillic word not working with lt-proc #205

@rmlockwood

Description

@rmlockwood

bilingual.txt

The attached bilingual dictionary has the line:
<e w="1"><p><l>наб1.1<s n="perspron" /></l><r>дам1.1<s n="pers" /></r></p></e>
I first compile it like this:
lt-comp lr bilingual.txt bilingual.bin

When I run lt-proc with this word uncapitalized it works fine:

echo "^наб1.1<perspron>$" |lt-proc -b -N1 -L1 bilingual.bin
"^наб1.1<perspron>/дам1.1<pers>$"

When I run lt-proc with the capitalized version, it fails to find the lowercase equivalent of the word in the bilingual dictionary:

>echo "^Наб1.1<perspron>$" |lt-proc -b -N1 -L1 bilingual.bin
"^Наб1.1<perspron>/@Наб1.1<perspron>$"

Here's some version info:

lt-proc -v
lt-proc version 3.8.1
lt-comp
lt-comp v3.8.1: build a letter transducer from a dictionary

I am using the Windows build of these tools.
An Apertium developer colleague said he could not reproduce this in Linux see: rmlockwood/FLExTrans#1111

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions