Skip to content

Xerces 2.8.1 hangs on malformed HTML files under Apache Tika #85

@GoogleCodeExporter

Description

@GoogleCodeExporter
This is not a direct problem with the metadata-extractor, but for the Apache 
Tika project. As outlined in https://issues.apache.org/jira/browse/TIKA-1154, 
Tika uses version 2.8.1 of Xerces, as that is what the metadata extractor 
requires, but that old version hangs on malformed HTML files.

This issue appears to have been fixed in later versions of Xerces (2.10.0 
onwards), but we don't know how upgrading Xerces will affect the 
metadata-extractor. Could you consider upgrading Xerces to a more recent 
version?

Thank you.
Andy Jackson




Original issue reported on code.google.com by [email protected] on 25 Jul 2013 at 1:50

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions