-
Notifications
You must be signed in to change notification settings - Fork 157
Description
Hey,
Thanks for the great work, I have some questions on the fine-tuning. I think it may come from the format of my input data. I've been looking at this link to try to get the right xml well shaped for my jpg images. But after fine-tuning (even after 1 epoch) i don't get any line 👀 .
Here is an example of xml file I have:
<?xml version="1.0" ?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/standards/alto/ns-v4#" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>/home/ubuntu/data/20250204_line_detection/images/FRANOM22_COLH78_0261_0232_4.jpg</fileName>
</sourceImageInformation>
</Description>
<Tags>
<OtherTag DESCRIPTION="line type" ID="LINE_TYPE_1" TYPE="type" LABEL="default"/>
<OtherTag DESCRIPTION="region type" ID="REGION_TYPE_1" TYPE="region" LABEL="text"/>
</Tags>
<Layout>
<Page WIDTH="491" HEIGHT="722" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722">
<TextBlock ID="42c3ee03-8810-4b48-9eb6-ddcb9f23321d" HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722" TAGREFS="REGION_TYPE_1">
<TextLine ID="780aac20-46ee-4072-a714-04c14590713b" HPOS="99" VPOS="48" WIDTH="358" HEIGHT="3" BASELINE="99 48 205 50 342 51 457 50" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="f9e0050e-4014-437b-bdb6-fc85e4059699" HPOS="107" VPOS="99" WIDTH="327" HEIGHT="1" BASELINE="107 100 194 99 277 100 434 100" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="9edd2cd0-e6a5-4579-8734-afafc03819e0" HPOS="101" VPOS="135" WIDTH="196" HEIGHT="8" BASELINE="101 143 167 141 297 135" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="5f86381b-c704-4cbd-a1b7-b6295f0ed4a3" HPOS="106" VPOS="177" WIDTH="361" HEIGHT="8" BASELINE="106 185 248 177 467 181" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="b4366d1e-151d-4e8c-9960-3bb7437a9c29" HPOS="102" VPOS="226" WIDTH="176" HEIGHT="1" BASELINE="102 226 167 227 278 226" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="166863f8-0ebe-488f-97ba-3e62a88aeb79" HPOS="100" VPOS="267" WIDTH="317" HEIGHT="3" BASELINE="100 270 224 268 417 267" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="dab2d94e-3ffd-42c2-bf69-7e3d05292b8c" HPOS="100" VPOS="311" WIDTH="254" HEIGHT="1" BASELINE="100 312 192 311 354 312" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="35552902-4556-490c-bcae-3fe2826655c6" HPOS="101" VPOS="352" WIDTH="172" HEIGHT="2" BASELINE="101 352 183 354 273 352" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="f1fbad86-de43-411b-8233-99030c1235ee" HPOS="104" VPOS="395" WIDTH="268" HEIGHT="4" BASELINE="104 397 191 399 372 395" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="f76cd89c-df93-446a-baa8-489ca7a4991e" HPOS="106" VPOS="438" WIDTH="205" HEIGHT="3" BASELINE="106 441 188 438 311 441" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="9da1cfdf-e9b1-4322-9171-22843062be75" HPOS="108" VPOS="477" WIDTH="170" HEIGHT="7" BASELINE="108 481 166 477 278 484" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="ebea3fb5-1b3f-4ce4-9509-33504388e3a8" HPOS="102" VPOS="526" WIDTH="189" HEIGHT="5" BASELINE="102 531 181 528 291 526" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="d45f7502-66d5-4ed2-9a55-6f8a6db7a7dd" HPOS="89" VPOS="572" WIDTH="395" HEIGHT="8" BASELINE="89 580 246 572 417 575 484 575" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
<TextLine ID="6d5e459f-6b18-493b-bcd8-8f2082620791" HPOS="83" VPOS="619" WIDTH="403" HEIGHT="7" BASELINE="83 626 217 625 315 625 486 619" TAGREFS="LINE_TYPE_1">
<String CONTENT=""/>
</TextLine>
</TextBlock>
</PrintSpace>
</Page>
</Layout>
</alto>And here is an BASELINE points on the image:

Then I'm using:
ketos -vvv segtrain -i /home/ubuntu/models/blla.mlmodel -f xml /home/ubuntu/data/20250204_line_detecti
on/alto_xml/*.xml -cl -o /home/ubuntu/models/ft_kraken -d cuda:0And everything looks to train, but the mean_iu stays around 0.25 and even decreases.
[02/04/25 15:54:37] INFO validation run: accuracy 0.9899430871009827 mean_acc 0.9899430871009827 mean_iu 0.2532690465450287 freq_iu 0.96146160364151
After a few epochs, when I run the inference, I don't get any line though...
Also, I'm using only 30 pictures to test the training before annotating more and scale the process. Do you have any idea why this is not working ?