Skip to content

UTF-8 Encoding not being respected #94

Open
@sylus

Description

@sylus

Hey @technosophos,

Before going into my issue just wanted to say I love your work on QueryPath!

As for the issue I was wondering if you would have any advice on what I could be doing wrong and why QueryPath seems to be ignoring the fact that a string is valid UTF-8.

<?php
      // Parse the HTML using QueryPath
      $qp_options = array(
        'convert_from_encoding' => 'UTF-8',
        'convert_to_encoding' => 'UTF-8',
        'strip_low_ascii' => FALSE,
      );

      //Taxonomy
      $this->qp = htmlqp($dbRow->BreadCrumbHTML, NULL, $qp_options);
      $taxonomy = $this->qp->top()->find('ul li:last')->text();

Where the content of $dbRow->BreadCrumbHTML is:

<ul><li style="display:inline;"><a href="/fr/index.html">Accueil</a></li> &gt; <li><a href="/fr/roads_trans/index.html">Routes et transports</a></li>  &gt; <li>Vélo</li></ul>

and the string I get returned for $taxonomy is:

"Vélo"

If I don't use querypath and just get the whole text the UTF-8 is maintained. I did also check to make sure mb_convert_encoding is being called and it does work and maintain the UTF-8 Encoding at that point in xdebug (PHP 5.3.9). Would you have any sagely advice on this on particular routes to further debug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions