Skip to content

Allow only spaces in Alpha and Alnum, not whitespaces #4

@weierophinney

Description

@weierophinney
  • I was not able to find an open or closed issue matching what I'm seeing.
  • This is not a question. (Questions should be asked on chat (Signup here) or our forums.)

A common use for these filters are names and addresses. They already have allow_white_spaces option, but it includes tabs and new lines. Can I create allow_printable_chars and allow_spaces options?

// Default settings, deny whitespace
$filter = new \Zend\I18n\Filter\Alnum();
echo $filter->filter("This is (my) content: 123");
// Returns "Thisismycontent123"

// First param in constructor is $allowWhiteSpace
$filter = new \Zend\I18n\Filter\Alnum(true);
echo $filter->filter("This is (my)\t content:\n 123");
// Returns "This is my     content
// 123"

// First param in constructor is also an array of options
$filter = new \Zend\I18n\Filter\Alnum(['allow_printable_chars'=>true]);
echo $filter->filter("This is (my)\t content\n: 123");
// Returns "This is my content: 123"

// First param in constructor is also an array of options
$filter = new \Zend\I18n\Filter\Alnum(['allow_spaces'=>true]);
echo $filter->filter("This is (my)\t content\n: 123");
// Returns "This is my content 123"

The main change will be a new pattern /[[:^print:]]/ at filter().

public function filter($value)
    {
        if (! is_scalar($value) && ! is_array($value)) {
            return $value;
        }

        $whiteSpace     = $this->options['allow_white_space'] ? '\s' : '';
        $space          = $this->options['allow_space'] ? ' ': $whiteSpace;
        $printableChars = $this->options['allow_printable_chars'];
        $language       = Locale::getPrimaryLanguage($this->getLocale());

        if (! static::hasPcreUnicodeSupport()) {
            // POSIX named classes are not supported, use alternative [a-zA-Z] match
            $pattern = '/[^a-zA-Z' . $whiteSpace . ']/';
        } elseif ($printableChars) {
             $pattern = '/[[:^print:]]/u';
        } elseif ($language === 'ja' || $language === 'ko' || $language === 'zh') {
            // Use english alphabet
            $pattern = '/[^a-zA-Z'  . $whiteSpace . ']/u';
        } else {
            // Use native language alphabet
            $pattern = '/[^\p{L}' . $whiteSpace . ']/u';
        }
        return preg_replace($pattern, '', $value);
    }

I don't know how asian chars behave with :print:

Pattern tests

$s = "tab\t\tnew line\nand   spaces!@#";
echo "1) $s\n\n";

echo "2)". preg_replace('/[^a-zA-Z\s]/', '', $s) . "\n\n";
echo "3)". preg_replace('/[^a-zA-Z]/', '', $s) . "\n\n";

echo "4)". preg_replace('/[^\p{L}\s]/u', '', $s) . "\n\n";
echo "5)". preg_replace('/[^\p{L}]/u', '', $s) . "\n\n";

echo "6)". preg_replace('/[^a-zA-Z ]/', '', $s) . "\n\n";
echo "7)". preg_replace('/[^\p{L} ]/', '', $s) . "\n\n";

echo "8)". preg_replace('/[[:^print:]]/u', '', $s) . "\n\n";
1) tab		new line
and   spaces!@#

2)tab		new line
and   spaces

3)tabnewlineandspaces

4)tab		new line
and   spaces

5)tabnewlineandspaces

6)tabnew lineand   spaces

7)tabnew lineand   spaces

8)tabnew lineand   spaces!@#


Originally posted by @rcapile at zendframework/zend-i18n#109

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions