Skip to content

several PDFs caused Qiqqa to run indefinitely after closing it #305

Open
@GerHobbelt

Description

@GerHobbelt

Continuation of #10, in a sense: different culprit, same pack of background tasks.

Now it turns out old pdfdraw -tt (see also #34: this bugger has to go) is locked up forever at max CPU for spurious / egregious PDFs. (🎅 isn't English language fun 🎅 ho ho ho! 🤡 )

That's the text extraction background process going b0rk b0rk b0rk on you. No way out but hard "kill process" for each of these.

Targeted fix

Upgrading/migration to latest MuPDF mudraw hOCR or JSON STEXT output -- the old pdfdraw that comes with current Qiqqa installs is an antique patched MuPDF tool (#34 + #35) and lots have changed since then, including the relevant output format for extracted text.

As I intend to support more document types (via the hOCR/HTML fundamental format), Qiqqa should grok the new pdfdraw -o *.ocr.html or similar output.

Also keep in mind the migration from the antique (obsoleted) LuceneNET version to SOLR / ElasticSearch: that's #23 + #298 + Technology areas and their function in Qiqqa + Towards migrating the PDF viewer / renderer / text extractor

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐛bugSomething isn't working

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions