Skip to content

zone types and nesting #44

@matyaskopp

Description

@matyaskopp

Currently, I have this design of zones, which I believe can be adopted in recommendations:

    <surface ...  ulx="0" uly="0" lrx="3282" lry="4810">
         <graphic url="https://api.kramerius.mzk.cz/search/iiif/uuid:268924df-fd01-43b3-bcb6-a846ab793756/full/max/0/default.jpg"/>
         <zone xml:id="...f.pg0.a1"
               start="#....pb1"
               type="page"
               points="2128,4711 24,4709 29,3406 34,2538 37,2125 48,1150 145,1098 2936,245 3141,243 3214,243 3166,4649 3165,4706"
               ulx="24"
               uly="243"
               lrx="3214"
               lry="4711">
            <zone xml:id="....f.pg0.a1.c1"
                  start="#....pb1.cb1"
                  type="column"
                  points="1054,4706 24,4709 29,3406 34,2538 37,2125 48,1150 145,1098 486,1096 1082,1093"
                  ulx="24"
                  uly="1093"
                  lrx="1082"
                  lry="4709">
               <zone xml:id="....f.pg0.a1.c1.l1"
                     start="#....pb1.cb1.lb1"
                     type="line"
                     points="1082,1093 972,1096 870,1096 768,1096 664,1098 560,1098 486,1096 415,1097 355,1098 253,1098 145,1098 145,1147 253,1147 356,1147 458,1145 560,1147 666,1147 768,1145 870,1145 973,1145 1081,1142 1082,1093"
                     ulx="145"
                     uly="1093"
                     lrx="1082"
                     lry="1147"/>
<!-- ... -->
            </zone>
<!-- ... -->
        </zone>
<!-- ... -->
    </surface>

I have introduced 3 types (zone/@type):

  • page - content without header/page numbers/...
  • column
  • line

which are nested according to the layout logic.

I am thinking about more types, and maybe we can advertise them in the documentation to have more uniform typing.
More types:

  • block - universal type, which I will probably use for this case (currently struggling with detecting them...)
Image
  • heading
  • advertisement
  • illustration
  • table

@TomazErjavec Do we want this? I know we focus mainly on text and its logical structure, but I believe it is good not to lose information when someone can access it through OCR or various tools for layout analysis. Even though not all the information is present in the "text" part of the TEI file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions