-
-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Nice work. I have some suggestions.
| $html = rtrim($html, "\n"); |
I think this unintentionally trims newlines in the end where it isn't needed. I think your intent is to remove newline somehow added by the code before but it ends up cutting newlines elsewhere.
Another observation on this part:
html5-dom-document-php/src/HTML5DOMDocument.php
Lines 159 to 161 in 3eccd3c
| // Preserve html entities | |
| $source = preg_replace('/&([a-zA-Z]*);/', 'html5-dom-document-internal-entity1-$1-end', $source); | |
| $source = preg_replace('/&#([0-9]*);/', 'html5-dom-document-internal-entity2-$1-end', $source); |
There is also an &#x type of entities. I am not sure of the following but you could check if the entity is really a genuine one or fake by doing something like
html_entity_decode( $matches[0], ENT_QUOTES, 'UTF-8' ) === $matches[0] )
with preg_replace_callback Maybe not needed.
You could also add some random string every time in the "internal" string for security purposes, maybe I am saying something silly.