Open
Description
When getting fragments of a document with html() or innerHTML(), some entities (NBSP, BULL) are not escaped on output, but are left in as UTF-8 character sequences.
This causes other PHP functions (like htmlentities()) to do weird things when the encoding argument is passed in.
Examle code:
<?php
$html = "<!DOCTYPE html><html><body>This is a string with a
non breaking space in it</body></html>";
$QP = htmlqp($html, 'html', array('convert_to_encoding' => 'utf-8'));
//$QP = htmlqp($html, 'html');
$QP = qp($html, 'html');
echo '1. ' . htmlentities($QP->html()) . PHP_EOL;
echo '2. ' . htmlentities($QP->top('body')->html()) . PHP_EOL;
echo '3. ' . htmlentities($QP->innerhtml()) . PHP_EOL;
echo '4. ' . htmlentities($QP->top('body')->html(), ENT_COMPAT, 'utf-8') . PHP_EOL;
?>
Only 1 and 4 encode the entities as expected.