Skip to content

Commit 300cd19

Browse files
committed
Fix corrupt UTF-8 char processing & shellquoting after aborted read
If the processing of a multibyte character was interrupted in UTF-8 locales, e.g. by reading just one byte of a two-byte character 'ü' (\303\274) with a command like: print -nr $'\303\274' | read -n1 g then the shellquoting algorithm was corrupted in such a way that the final quote in simple single-quoted string was missing. This bug may have had other, as yet undiscovered, effects as well. The problem was with corrupted multibyte character processing and not with the shell-quoting routine sh_fmtq() itself. Full trace and discussion at: ksh93/ksh#5 (which is also an attempt to begin to understand the esoteric workings of the libast mb* macros that process UTF-8 characters). src/lib/libast/comp/setlocale.c: utf8_mbtowc(): - If called from the mbinit() macro (i.e. if both pointer parameters are null), reset the global multibyte character synchronisation state variable. This fixes the problem with interrupted processing leaving an inconsistent state, provided that mbinit() is called before processing multibyte characters (which it is, in most (?) places that do this). Before this fix, calling mbinit() in UTF-8 locales was a no-op. src/cmd/ksh93/sh/string.c: sh_fmtq(): - Call mbinit() before potentially processing multibyte characters. Testing suggests that this could be superfluous, but at worst, it's harmless; better be sure. src/cmd/ksh93/tests/builtins.sh: - Add regression test for shellquoting with 'printf %q' after interrupting the processing of a multibyte characeter with 'read -n1'. This test only fails in a UTF-8 locale, e.g. when running: bin/shtests -u builtins SHELL=/buggy/ksh-2012-08-01 Fixes ksh-community#5.
1 parent 2624b29 commit 300cd19

File tree

5 files changed

+28
-1
lines changed

5 files changed

+28
-1
lines changed

NEWS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,12 @@ For full details, see the git log at: https://github.com/ksh93/ksh
33

44
Any uppercase BUG_* names are modernish shell bug IDs.
55

6+
2020-07-05:
7+
8+
- In UTF-8 locales, fix corruption of the shell's internal string quoting
9+
algorithm (as used by xtrace, 'printf %q', and more) that occurred when
10+
the processing of a multibyte character was interrupted.
11+
612
2020-07-03:
713

814
- Backslashes are no longer escaped in the raw Bourne Shell-like editing

src/cmd/ksh93/include/version.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@
1717
* David Korn <[email protected]> *
1818
* *
1919
***********************************************************************/
20-
#define SH_RELEASE "93u+m 2020-07-02"
20+
#define SH_RELEASE "93u+m 2020-07-05"

src/cmd/ksh93/sh/string.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,9 @@ char *sh_fmtq(const char *string)
336336
int offset;
337337
if(!cp)
338338
return((char*)0);
339+
#if SHOPT_MULTIBYTE
340+
mbinit();
341+
#endif /* SHOPT_MULTIBYTE */
339342
offset = staktell();
340343
state = ((c= mbchar(cp))==0);
341344
if(isaletter(c))

src/cmd/ksh93/tests/builtins.sh

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,23 @@ fi
302302
if [[ $(printf '%..*s\n' : abc def) != abc:def ]]
303303
then err_exit "printf '%..*s' not working"
304304
fi
305+
306+
# ======
307+
# shell-quoting using printf %q (same algorithm used for xtrace and output of 'set', 'trap', ...)
308+
305309
[[ $(printf '%q\n') == '' ]] || err_exit 'printf "%q" with missing arguments'
310+
311+
# the following fails on 2012-08-01 in UTF-8 locales
312+
expect="'shell-quoted string'"
313+
actual=$(
314+
print -nr $'\303\274' | read -n1 foo # interrupt processing of 2-byte UTF-8 char after reading 1 byte
315+
printf '%q\n' "shell-quoted string"
316+
)
317+
LC_CTYPE=POSIX true # on buggy ksh, a locale re-init via temp assignment restores correct shellquoting
318+
[[ $actual == "$expect" ]] || err_exit 'shell-quoting corrupted after interrupted processing of UTF-8 char' \
319+
"(expected $expect; got $actual)"
320+
321+
# ======
306322
# we won't get hit by the one second boundary twice, right?
307323
expect= actual=
308324
{ expect=$(LC_ALL=C date | sed 's/ GMT / UTC /') && actual=$(LC_ALL=C printf '%T\n' now) && [[ $actual == "$expect" ]]; } ||

src/lib/libast/comp/setlocale.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -600,6 +600,8 @@ utf8_mbtowc(wchar_t* wp, const char* str, size_t n)
600600
register int c;
601601
register wchar_t w = 0;
602602

603+
if (!wp && !sp)
604+
ast.mb_sync = 0; /* assume call from mbinit() macro: reset global multibyte sync state */
603605
if (!sp || !n)
604606
return 0;
605607
if ((m = utf8tab[*sp]) > 0)

0 commit comments

Comments
 (0)