Skip to content

Conversation

@lucaswerkmeister
Copy link
Member

Replace some TODOs with actual exception throwing. The exceptions aren’t great, but at least they’re better than just ignoring the errors.

The first case doesn’t have a test because, as far as I can tell, it’s basically impossible to trigger. There are apparently very few situations where loadHTML() will return false rather than trying to fix up the markup somehow, and we’re already guaranteeing that the HTML isn’t empty and that the options are valid.

The second case can raise a variety of errors; just test one of them. Turning the array of LibXMLError instances (which aren’t Throwable) into a “tree” / “stack” of Exception instances isn’t super nice, but better than doing nothing or only throwing an Exception for the first error.

@lucaswerkmeister
Copy link
Member Author

Oh you’re kidding me.

Exception: Tag template invalid

libxml throws errors on <template> tags??

@lucaswerkmeister
Copy link
Member Author

Even stranger, on my system there are no warnings about <template> tags (even though I can reproduce the warnings on https://3v4l.org/). No idea what’s different…

Replace some TODOs with actual exception throwing. The exceptions aren’t
great, but at least they’re better than just ignoring the errors.

The first case doesn’t have a test because, as far as I can tell, it’s
basically impossible to trigger. There are apparently very few
situations where loadHTML() will return false [1] rather than trying to
fix up the markup somehow, and we’re already guaranteeing that the HTML
isn’t empty and that the options are valid.

The second case can raise a variety of errors; just test one of them.
Turning the array of LibXMLError instances (which aren’t Throwable) into
a “tree” / “stack” of Exception instances isn’t super nice, but better
than doing nothing or only throwing an Exception for the first error.

Note that we have to ignore / skip one error: libxml may complain about
“invalid” <template> tags, but we need those for SFC handling.

[1]: https://stackoverflow.com/a/79055330/1420237
@lucaswerkmeister
Copy link
Member Author

Side note: in the far future we should presumably use HTMLDocument instead of DOMDocument, but that’s only available starting in PHP 8.4 :)

Copy link
Contributor

@codders codders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thank you!

@codders codders merged commit 7272020 into master Jun 10, 2025
10 checks passed
@codders codders deleted the html branch June 10, 2025 07:11
lucaswerkmeister added a commit that referenced this pull request Jun 13, 2025
This is used in some WikibaseLexeme templates (albeit never in the
server-rendered path as far as I can tell), so we should support it.
There are two related problems: old versions of libxml2 complain about
@-prefixed attribute names, and our code only checked for v-on:. Both
are fairly simple to resolve.

The libxml2 error doesn’t happen on my system, so I assume it was fixed
in some recent-ish version; libxml2 2.14.0 [1], which boasts a tokenizer
“conform[ing] fully to HTML5” and “several non-standard syntax warnings
[…] removed”, seems like the most likely candidate, but I haven’t tested
the behavior with different versions. But in any event, libxml2 is
clearly able to parse the markup and produce a DOM that we can work with
(as evidenced by the fact that the WikibaseLexeme template have worked
for years before we started raising the libxml2 errors as exceptions in
ba3f8e0 / #32), so we can just ignore the errors.

[1]: https://download.gnome.org/sources/libxml2/2.14/libxml2-2.14.0.news
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants