Skip to content

Urlencode in parse? #63

@DanAlexson90

Description

@DanAlexson90

Something not mentioned in the official documentation:

https://sabre.io/uri/usage/

but only in the source code:

https://github.com/sabre-io/uri/blob/master/lib/functions.php

is this piece:

// Normally a URI must be ASCII, however. However, often it's not and
// parse_url might corrupt these strings.
//
// For that reason we take any non-ascii characters from the uri and
// uriencode them first.
$uri = preg_replace_callback(
    '/[^[:ascii:]]/u',
    function ($matches) {
        return rawurlencode($matches[0]);
    },
    $uri
);

Urlencoding is NOT appropriate for domain name / FQDN!

For example, these IRIs:

should NOT be parsed to:

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string '%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80.%D1%80%D1%84' (length=49)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string '%D0%BF%D1%80%D0%B8%D0%BA%D0%BB%D0%B0%D0%B4.%D1%83%D0%BA%D1%80' (length=61)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string '%CF%80%CE%B1%CF%81%CE%AC%CE%B4%CE%B5%CE%B9%CE%B3%CE%BC%CE%B1.%CE%B5%CE%BB' (length=73)
  'path' => string '/' (length=1)
  ...

Instead they should be parsed to:

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string 'пример.рф' (length=17)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string 'приклад.укр' (length=21)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string 'παράδειγμα.ελ' (length=25)
  'path' => string '/' (length=1)
  ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions