Skip to content

Xapian issue when title is too "long" #992

Open
@benoit74

Description

@benoit74

openzim/mwoffliner#2318 exhibit an issue around Xapian.

Digging deeper, I narrowed down the problem:

  • it is linked to item title
  • when item title is 122 times or less the ي character, everything is fine
  • when item title is 123 times or more the ي character (or any other 2 bytes UTF-8 character), we get the Xapian error
  • when item title is 82 times or more the character (or any other 3 bytes UTF-8 character), we get the Xapian error

I reproduce the issue both with python-libzim and node-libzim.

Here is minimalist Python code snippet reproducing the error with a 2 bytes character (it will fail when i is 123)

from zimscraperlib.zim import Creator
from pathlib import Path
from zimscraperlib.zim import metadata

creator = Creator(Path("tests.zim"), "index.html").config_metadata(
    std_metadata=metadata.DEFAULT_DEV_ZIM_METADATA
)

# start creator early to detect any problem early as well
creator.start()
creator.set_mainpath("index")

creator.add_item_for("index", "Main Page", content="any", is_front=True )

for i in range(256):
    print(i)
    path = f"path{i}"
    title = "ي" * i
    creator.add_item_for(path, title, content="any", is_front=True )

creator.finish()

I will implement an interim fix in mwoffliner, but we probably need to either fix this issue or document this limitation (if not already done, I might have missed it).

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions