Skip to content

mb_convert_encoding($x, $y, 'HTML-ENTITIES') not functionally equivalent #344

Open
@cpeel

Description

@cpeel

When calling mb_convert_encoding() with $fromEncoding === 'HTML-ENTITIES', the polyfill does not return functionally equivalent strings to the native function. This is because mb_convert_encoding() uses html_entity_decode() when $fromEncoding === 'HTML-ENTITIES' and that function does not return characters for many numeric entities 0-31 and 127-159. For example:

<?php

require "vendor/symfony/polyfill-mbstring/Mbstring.php";

use Symfony\Polyfill\Mbstring as p;

for($i = 0; $i < 1024; $i++) {
	$string = "&#" . $i . ";";
	$mbstring = mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
	$polyfill = p\Mbstring::mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
	if($mbstring != $polyfill) {
		echo "Mismatch: $string - mbstring: $mbstring; polyfill: $polyfill\n";
	}
}

outputs:

Mismatch: &#0; - mbstring: ; polyfill: &#0;
Mismatch: &#1; - mbstring: ; polyfill: &#1;
Mismatch: &#2; - mbstring: ; polyfill: &#2;
Mismatch: &#3; - mbstring: ; polyfill: &#3;
Mismatch: &#4; - mbstring: ; polyfill: &#4;
Mismatch: &#5; - mbstring: ; polyfill: &#5;
Mismatch: &#6; - mbstring: ; polyfill: &#6;
Mismatch: &#7; - mbstring: ; polyfill: &#7;
Mismatch: &#8; - mbstring:; polyfill: &#8;
Mismatch: &#11; - mbstring:
                            ; polyfill: &#11;
Mismatch: &#12; - mbstring:
                            ; polyfill: &#12;
Mismatch: &#14; - mbstring: ; polyfill: &#14;
Mismatch: &#15; - mbstring: ; polyfill: &#15;
Mismatch: &#16; - mbstring: ; polyfill: &#16;
Mismatch: &#17; - mbstring: ; polyfill: &#17;
Mismatch: &#18; - mbstring: ; polyfill: &#18;
Mismatch: &#19; - mbstring: ; polyfill: &#19;
Mismatch: &#20; - mbstring: ; polyfill: &#20;
Mismatch: &#21; - mbstring: ; polyfill: &#21;
Mismatch: &#22; - mbstring: ; polyfill: &#22;
Mismatch: &#23; - mbstring: ; polyfill: &#23;
Mismatch: &#24; - mbstring: ; polyfill: &#24;
Mismatch: &#25; - mbstring: ; polyfill: &#25;
Mismatch: &#26; - mbstring: ; polyfill: &#26;
Mismatch: &#27; - mbstring:  polyfill: &#27;
Mismatch: &#28; - mbstring: ; polyfill: &#28;
Mismatch: &#29; - mbstring: ; polyfill: &#29;
Mismatch: &#30; - mbstring: ; polyfill: &#30;
Mismatch: &#31; - mbstring: ; polyfill: &#31;
Mismatch: &#39; - mbstring: '; polyfill: &#39;
Mismatch: &#127; - mbstring: �; polyfill: &#127;
Mismatch: &#128; - mbstring: �; polyfill: &#128;
Mismatch: &#129; - mbstring: �; polyfill: &#129;
Mismatch: &#130; - mbstring: �; polyfill: &#130;
Mismatch: &#131; - mbstring: �; polyfill: &#131;
Mismatch: &#132; - mbstring: �; polyfill: &#132;
Mismatch: &#133; - mbstring: �; polyfill: &#133;
Mismatch: &#134; - mbstring: �; polyfill: &#134;
Mismatch: &#135; - mbstring: �; polyfill: &#135;
Mismatch: &#136; - mbstring: �; polyfill: &#136;
Mismatch: &#137; - mbstring: �; polyfill: &#137;
Mismatch: &#138; - mbstring: �; polyfill: &#138;
Mismatch: &#139; - mbstring: �; polyfill: &#139;
Mismatch: &#140; - mbstring: �; polyfill: &#140;
Mismatch: &#141; - mbstring: �; polyfill: &#141;
Mismatch: &#142; - mbstring: �; polyfill: &#142;
Mismatch: &#143; - mbstring: �; polyfill: &#143;
Mismatch: &#144; - mbstring: �; polyfill: &#144;
Mismatch: &#145; - mbstring: �; polyfill: &#145;
Mismatch: &#146; - mbstring: �; polyfill: &#146;
Mismatch: &#147; - mbstring: �; polyfill: &#147;
Mismatch: &#148; - mbstring: �; polyfill: &#148;
Mismatch: &#149; - mbstring: �; polyfill: &#149;
Mismatch: &#150; - mbstring: �; polyfill: &#150;
Mismatch: &#151; - mbstring: �; polyfill: &#151;
Mismatch: &#152; - mbstring: �; polyfill: &#152;
Mismatch: &#153; - mbstring: �; polyfill: &#153;
Mismatch: &#154; - mbstring: �; polyfill: &#154;
Mismatch: &#155; - mbstring: �; polyfill: &#155;
Mismatch: &#156; - mbstring: �; polyfill: &#156;
Mismatch: &#157; - mbstring: �; polyfill: &#157;
Mismatch: &#158; - mbstring: �; polyfill: &#158;
Mismatch: &#159; - mbstring: �; polyfill: &#159;

While many of these are control characters (and the native function does return them), the single quote (dec 39) is particularly problematic.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions