Open
Description
When calling mb_convert_encoding()
with $fromEncoding === 'HTML-ENTITIES'
, the polyfill does not return functionally equivalent strings to the native function. This is because mb_convert_encoding()
uses html_entity_decode()
when $fromEncoding === 'HTML-ENTITIES'
and that function does not return characters for many numeric entities 0-31 and 127-159. For example:
<?php
require "vendor/symfony/polyfill-mbstring/Mbstring.php";
use Symfony\Polyfill\Mbstring as p;
for($i = 0; $i < 1024; $i++) {
$string = "&#" . $i . ";";
$mbstring = mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
$polyfill = p\Mbstring::mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
if($mbstring != $polyfill) {
echo "Mismatch: $string - mbstring: $mbstring; polyfill: $polyfill\n";
}
}
outputs:
Mismatch: � - mbstring: ; polyfill: �
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring:; polyfill: 
Mismatch:  - mbstring:
; polyfill: 
Mismatch:  - mbstring:
; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch:  - mbstring: ; polyfill: 
Mismatch: ' - mbstring: '; polyfill: '
Mismatch:  - mbstring: �; polyfill: 
Mismatch: € - mbstring: �; polyfill: €
Mismatch:  - mbstring: �; polyfill: 
Mismatch: ‚ - mbstring: �; polyfill: ‚
Mismatch: ƒ - mbstring: �; polyfill: ƒ
Mismatch: „ - mbstring: �; polyfill: „
Mismatch: … - mbstring: �; polyfill: …
Mismatch: † - mbstring: �; polyfill: †
Mismatch: ‡ - mbstring: �; polyfill: ‡
Mismatch: ˆ - mbstring: �; polyfill: ˆ
Mismatch: ‰ - mbstring: �; polyfill: ‰
Mismatch: Š - mbstring: �; polyfill: Š
Mismatch: ‹ - mbstring: �; polyfill: ‹
Mismatch: Œ - mbstring: �; polyfill: Œ
Mismatch:  - mbstring: �; polyfill: 
Mismatch: Ž - mbstring: �; polyfill: Ž
Mismatch:  - mbstring: �; polyfill: 
Mismatch:  - mbstring: �; polyfill: 
Mismatch: ‘ - mbstring: �; polyfill: ‘
Mismatch: ’ - mbstring: �; polyfill: ’
Mismatch: “ - mbstring: �; polyfill: “
Mismatch: ” - mbstring: �; polyfill: ”
Mismatch: • - mbstring: �; polyfill: •
Mismatch: – - mbstring: �; polyfill: –
Mismatch: — - mbstring: �; polyfill: —
Mismatch: ˜ - mbstring: �; polyfill: ˜
Mismatch: ™ - mbstring: �; polyfill: ™
Mismatch: š - mbstring: �; polyfill: š
Mismatch: › - mbstring: �; polyfill: ›
Mismatch: œ - mbstring: �; polyfill: œ
Mismatch:  - mbstring: �; polyfill: 
Mismatch: ž - mbstring: �; polyfill: ž
Mismatch: Ÿ - mbstring: �; polyfill: Ÿ
While many of these are control characters (and the native function does return them), the single quote (dec 39) is particularly problematic.