-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I have been working on a project to develop a Sgaw Karen [ksw] to Thai [tha] language pair and as part of the project, I wanted to develop a transliterator between the three+ Sgaw Karen orthographies. Using https://user.keio.ac.jp/~kato/SgawKarenRomei.pdf as a guide, I developed a small test .dix, which I show below:
<dictionary>
<alphabet>abcdefghijklmnopqrstuvwxyz ဖခဘ ံ ိ ၣ်</alphabet>
<sdefs/>
<section id="consonants" type="inconditional">
<e><p><l>hp</l><r>ဖ</r></p></e>
<e><p><l>hk</l><r>ခ</r></p></e>
<e><p><l>b</l><r>ဘ</r></p></e>
</section>
<section id="vowels" type="inconditional">
<e><p><l>i</l><r>ံ</r></p></e> <!-- U+1036, bytes: e1 80 b6 -->
<e><p><l>o</l><r>ိ</r></p></e> <!-- U+102D, bytes: e1 80 ad -->
<e><p><l>a</l><r></r></p></e> <!-- empty output is okay -->
<e><p><l>f</l><r>ၣ်</r></p></e>
</section>
</dictionary>
It seems to successfully compile as a .bin with lt-comp
lt-comp lr rom-test.dix rom-test.bin
consonants@inconditional 4 5
vowels@inconditional 3 5
and lt-expand shows the correct mapping:
lt-expand rom-test.dix
hp:ဖ
hk:ခ
b:ဘ
i:ံ
o:ိ
a:
f:ၣ်
However, when testing with lt-proc -t I get incorrect output:
printf 'hpi hpi\n' | lt-proc -t ./rom-test.bin
ဖi ဖi
(Expected output: ဖံ ဖံ)
It seems that none of the vowels will render after a consonant, but a vowel by itself or in succession will render just fine:
printf 'i i\n' | lt-proc -t ./rom-test.bin
ံ
To be sure, I ran the first prompt through hexdump and it confirmed that the 'i' is just passing through as-is. So it seems to be a compilation problem, not a unicode problem. (or is it a compilation problem stemming from a Unicode problem?)
``
printf 'hpi hpi\n' | lt-proc -t ./rom-test.bin | hexdump -C
00000000 e1 80 96 69 20 e1 80 96 69 0a |...i ...i.|
0000000a
**Update**
Interestingly, the vowels transliterate without issue if there is a space between them and the digraphs:
echo "hk i" | lt-proc -t rom-test.bin
ခ ံ
However, of course, the consonant and vowel need to be together (ခံ), which is not an issue with non-digraph inputs
echo "bi" | lt-proc -t rom-test.bin
ဘံ
Any help would be appreciated!