Skip to content

Implementing a solution

John Stevenson edited this page Aug 5, 2016 · 6 revisions

There is no complete solution to argument-escaping on Windows, particularly when cmd.exe is involved. The best that can be achieved is a robust compromise that can handle most cases without introducing a set of complex rules.

Windows command-line

From How Windows parses the command-line it is clear that the only character that might cause unexpected results is a double-quote. So we need a convention for handling these and arguments in general:

  • The argument is treated as unescaped.
  • Any double-quotes in an argument are escaped as literal double-quotes.
  • An argument will not be enclosed in quotes unless absolutely necessary.

This will avoid inconsistencies when handling consecutive double-quotes and enable each argument to be included in a command-line without it affecting other items. It will also stop the cases where quotes can break batch scripts.

Outline

Having defined our convention, the steps to escape an argument are simply:

  1. Replace all [backslashes] double-quote with [2 x backslashes] backslash double-quote.
  2. If a space or tab character is found, or the argument is empty:
  • double up trailing backslashes.
  • add surrounding double-quotes.

PHP code

function escapeWin($arg)
{
    $arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);

    $if (strpbrk($arg, " \t") !== false || $arg === '') {
        $arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
        $arg = '"'.$arg.'"';
    }
    return $arg;
}

Incorporating cmd.exe

From How cmd.exe parses a command we know that meta characters have a special meaning. How we deal with these is split into the following sections:

Double-quotes

From the point of view of cmd, all double-quotes either start or end a quoted-string, regardless of whether they are backslash-escaped. This could have unexpected consequences if there is an odd number of double quotes, or in other situations.

The argument colors="red & blue" would be escaped:

"colors=\"red & blue\""

However the & character is no longer protected by the opening double-quote, because the quoted-string has been closed by the first intended literal double-quote. The result is that this part of the command is split by the & character and cmd trys to call a program named blue\"".

The only way to solve this is to caret-escape the whole argument.

Variable expansion

Environment variable expansion is triggered by the %...% and !...! syntax, regardless of the quoted-string state. Therefore we need to caret-escape the whole argument.

However we cannot do this for exclamation-marks. These require an escape sequence of two carets ^^!, due to the two step parsing that cmd performs, and we have no way of knowing the DelayedExpansion state (other than it is disabled by default):

  • If enabled, an escaped ^^!var^^! will be transformed to !var! as intended.
  • If disabled, an escaped ^^!var^^! will be transformed to ^!var^! and introduce two unintended carets.

Other meta characters

These are the characters that have not yet been accounted for: ^ & | < > ( )

Since they have no special meaning inside a quoted-string (and we know there are no double-quotes to confuse the quoted-string state) we have two choices:

  • Do nothing if there is whitespace in the argument (because any meta characters will be escaped by the enclosing double-quotes).
  • Enclose the argument in double-quotes if it contains any meta characters.

Note that we do not use caret-escaping in case we come up against its single limitation.

Meta escaping rules

We can condense the above into the following rules:

  • If an argument contains double-quotes or %...% syntax, the transformed argument must be caret-escaped.
  • Otherwise if it does not contain whitespace but does contains meta characters it will be enclosed in double-quotes.
  • The ! meta character is not escaped because it cannot be handled reliably.

Outline

We need to set the following flags:

  1. Set quote to true if a space or tab character is found, or the argument is empty.
  2. Set dquotes to true if a double-quote character is found.
  3. Set meta to true if dquotes is true or two % characters surround other characters.
  • We need to caret-escape everything, including any enclosing double-quotes.
  1. If meta and quote are false, set quote to true if any ^ & | < > ( ) characters are found.
  • We can safely escape these characters using the surrounding double-quotes.

Now we can perform the escaping:

  1. If dquotes is true:
  • Replace all [backslashes] double-quote with [2 x backslashes] backslash double-quote.
  1. If quote is true:
  • double up trailing backslashes.
  • add surrounding double-quotes.
  1. If meta is true:
  • escape all " ^ & | < > ( ) % characters with a caret ^.

PHP code

function escapeCmdExe($arg)
{
    $quote = strpbrk($arg, " \t") !== false || $arg === '';
    $dquotes = strpos($arg, '"') !== false;
    $meta = $dquotes || preg_match('/%[^%]+%/', $arg);

    if (!$meta && !$quote) {
        $quote = strpbrk($arg, '^&|<>()') !== false;
    }

    if ($dquotes) {
        $arg = preg_replace('/(\\\\*)"/', '$1$1\\"', $arg);
    }

    $if ($quotes) {
        $arg = preg_replace('/(\\\\*)$/', '$1$1', $arg);
        $arg = '"'.$arg.'"';
    }

    if ($meta) {
        $arg = preg_replace('/(["^&|<>()%])/', '^$1', $arg);
    }
    return $arg;
}

Clone this wiki locally