Replies: 6 comments 4 replies
-
Thanks for all the research and examples. In my opinion, I believe mix'n'match should not be allowed unless it is defined in some way in POSIX. The Wikipedia printf Parameter field section for n$ (which is referenced above as %x$) mentions, "This is a POSIX extension and not in C99." I wonder if one of our POSIX group community members could hunt this specification down and enlightened us to its contents. Wikipedia continues within that section and states:
|
Beta Was this translation helpful? Give feedback.
-
Thanx @hyenias , it is a start of reflexion :-) . I started my work from printf(3) docco that sez (on my linux distro)
So doesn't mention posix anywhere, when you talk about posix, did you mean posix for C or posix for shell's ? I am totally ignorant about posix thing, so may be posix itself define both, and if so I guess they have 2 sections for printf, one for printf(1) and one for printf(3) ? and indeed they differ a lot, and probably in incompatible ways (consumed argv[], mix'n'match) So far we got 3 options.
Now regarding All my examples are written f=fmt-string ; printf .... because that's the format used in test suite and I simply cut/paste here.... With a shell (in the sh family) is is perfectly possible to write zsh try to be smarter than ksh here and to delay the argv[] conversion very late in the format scan, allowing %1$d to fetch argv[0] and use the numeric expression as a number, emit result, then later on fetch argv[1] again and this time use it as a string. The ksh algo is way broken here, and need a lot of band aid around the existing code. It looks like the code was designed for non $ at all (ksh archeologist may confirm) then one decided to implement x$ like this. On first $ occurence in a format string, it trig the build of yet another argv[] kind called fp[] (where may be the 'p' stand for position) and this one is bugged, it grab the first occurence of an index (say 1 that is argv[0]) and enter in fp[0].fmt to the %format associated with 1$ for instance %1$d, fd[0].fmt='d' and fp[0].value=numeval(argv[0]), and that's it for 1$, next occurence of 1$ is considered 'cached' as dejavu, and next %1$s will not re-eval argv[0] will use the previous number as a string, with all the crash we got. What I describe here is indeed worsened by x$ occurences in width/precision. The term mix'n'match in this discussion is not limited to %1$d %1$s, but indeed apply to width/prec as And when even this is solved (mix'n'match), then there must be an agreement about how to handle non $ access following $ access for width/precision. like here
As you see here the warning is due to the mix'n'match of $ vs non $, yet this is just a warning and the semantic applied here (and logical to me) is that any non $ are evaluated like x$ with x starting at 1 and incremented on each non $ so ksh/zsh god knows why decided the non $ 'd' is right after the last x$ for width/prec i.e May be implementing the C way would trig some regression in our current QA suite (I am not that far at the moment) So I still query to vox populi, if we go the fix it path, should we put back the C semantic for tolerated (warning) mix'n'match,. My vote is C way but I may be the only one :) |
Beta Was this translation helpful? Give feedback.
-
Thanx @hyenias
Despite ksh is wrong here this is easily fixable and would produce zsh output after fix, this is a case where we can consider zsh correct, and establish the norm. Now this one is problematic to me
Why
Ksh logic (even though bugged here), would favor the max of the indexed access so here would be To me it is not logical there is no reasons to skip So I'd like a decision on this last point, do we go the erratic zsh way, do they establish the norm? do we wait POSIX, do we add a printf flag -z behave like zsh (last indexed) , -k behave like ksh (max indexed) -s (sequential are respected) [-p] posix, adapt to what posix will say after the y3k bug ? (kidding) |
Beta Was this translation helpful? Give feedback.
-
Hmmmm... I prefer not to add more flags to printf to perform differing behavior of indexed/sequential parameter access. Consistency is key. Now, if there is a performance impact to doing something uncommon then sure put that uncommon stuff/behavior/special case behind a flag. I have not looked at the code. As I think I understand it from your input, ksh's version of printf consumes the parameters passed along it probably turning them into a token and popping them one out at a time. So as ksh printf processes along the parameter tokens, it consumes them thus putting us into the hard to resolve situation for out-of-order parameter access as the format string is parsed. Is this what is going on? If the argv array is being consumed, can we not rewrite the code to use pointers to the argv array to keep all the original parameters intact thus allowing all the references either absolute or relative to be accessible and understood? I do not like erratic anything. So, not the zsh, ksh, or bash way of doing things as none of it is clear much less seemingly correct. We do not wait on POSIX. If I could have it my way (not necessarily the correct way but you asked me), I keep the whole process KISS. This means, I would keep all passed parameter values intact so that they can be referenced positionally either through direct absolute indexed referencing or via next sequential value after the last used indexed position. Is this your "last indexed" way? Not sure how one would move the sequential access along to prevent infinite loop. Did I help answer your questions? |
Beta Was this translation helpful? Give feedback.
-
Thanx @hyenias , I'll take all this in consideration. I think I will come with a 'fix' candidate, trying to keep all thins in mind. I'll try to make it 'configurable', i.e will come up with one implementation of the %*X$.*Y$F ( indexed for width/prec but not for F format letter) i.e either pure seuential for F access, or one past the last seen width/prec ndx, or one past the max of witdth/prec/ndx) will do this in a way we can change at the last minute the implementation we prefere, and if nobody agry, i.e two ways are acceptable then will see.... The fix I envision as no perf impact on 'regular' path, will trig a format rescan on ambigous indexed access, i.e all the strange questionable mix'n'match allowed to exist (compared to a compiler kind that would reject such mix'n'match) I'll give it a try and see what I come up with. Note that I guess this fix is marginal, I mean nobody do those mix'n'match, most people simply do %s sometime with hardcoded width/prec, and if really needed do var expansion in the format string to get a parameterizable prec/width i.e '%3.4d' or "%$w.${p}s", what really bugged me here is how easily one can segv or loop on the wrongly setup format string. |
Beta Was this translation helpful? Give feedback.
-
Made a candidate fix The patch is posted at This is a call for testers, i.e verify and accept/reject the new output, and a call for code review if accepted. Here is the script I use to generate outputs, and generate the QA suite.
For testers it can be used like this ksh-b stand for bad, i.e my distro ksh, ksh-f for fixed.
The current ksh-f.lst is
The output is almost on par with zsh the diffs are
IMHO all zsh diff are wrong, unless someone explain.
meaning non indexed format (i.e seq format |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi All,
Before fixing #324 I'd like to poll you opinion about it.
printf builtin allow sequential access to argv[] values with % format and allow indexed access to argv[x] with %x$ format, not only %2$s but as for width/precision as well.
Mix'n'match of this two % and %x$ is very prone to misinterpretation, and result in a lot of troubles.
C printf(3) don't allow mix'n'match, it want all be % or all be %x$, yet this is a warning, and ignoring it, it behave like if any non indexed acces got internally indexed by its occurence in the format string.
Something like
"%s %s %5$s %4$s %s"
is treated like"%1$s %2$s %5$s %4$s %5$s" with the fact that 2 occurences of a given index (here 5) assume the argv[4] is of an appropriate type for the corresponding format, i.e
printf("%1$d %1$x\n",32);is ok and
printf("%1$s %1$x\n",32); ` is not ok.Should we insist of allowing indexed access knowing that current implementation is plain wrong, then probably nobody use it, or do we consider there are still some canonical case that some may use and do work as expected.
Bash refrain to do any x$ addressing
Zsh do it, still mostly wrong, yet it tend to simply provide wrong result, compared to ksh that provide, wrong result, goes in almost infinite loops, or even segfault.
I think I could make it work, but would wait the poll answer, if nobody need that, no need to work on it, just add some security check, and reject x$ with a message as bash do.
I provide some test case here that shows how bad it is on ksh.
Beta Was this translation helpful? Give feedback.
All reactions