-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip whitespace in SeqN grammar #1626
base: develop
Are you sure you want to change the base?
Conversation
One more note to add on this PR: I think there's an opportunity to go farther, and skip newlines and maybe comments as well. I didn't do that here because I think it might expand what counts as legal SeqN, by effectively making all mandatory newlines optional, and allowing you to put comments anywhere. I don't know if that's an acceptable trade-off for having a simpler grammar. From a quick code search, it also looks like comments may be meaningful in some contexts, so skipping them may be completely off the table. |
Add a skip expression for whitespace to the SeqN grammar, rather than dealing with whitespace by hand. This should simplify the grammar, without significantly changing its meaning. All grammar regression tests for legal SeqN examples pass unchanged. A few examples of invalid SeqN parse slightly differently, but the new parse tree is still reasonable. For these examples, rebaseline the test to the new behavior.
Condense similar rules together and in-line some very simple rules.
d21b30b
to
d91e442
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a good idea but we are using whitespace a a delimiter. We want the user to have space between certain elements such as time, command stems, and arguments. I believe this change should not be merged in
ex.
A2025-001T10:00:00ECHO"arg1"#test <--- Now valid with the new grammar change
Also you might have ambiguity when parsing arguments specifically with number arguments
ex
C CMD_STEM 1 10
C CMD_STEM 110
], | ||
[ | ||
'Stem with disallowed characters', | ||
`FSW_CMD%BAR$BAZ`, | ||
`Sequence(Commands( | ||
Command(Stem,⚠), | ||
Command(Stem,⚠), | ||
Command(Stem,Args) | ||
))`, | ||
`Sequence(Commands(Command(TimeTag(TimeComplete),⚠(Number),Stem,Args),Command(Stem,⚠,Args(Enum))))`, | ||
], | ||
['Stem with disallowed characters', `FSW_CMD%BAR$BAZ`, `Sequence(Commands(Command(Stem,⚠,Args(Enum,⚠,Enum))))`], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curios why this test was changed? I don't think it needed to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new grammar just parsed it slightly differently, so I changed the expected output to match. It winds up parsing as a single command with some errors and enum arguments, rather than multiple commands with errors in between.
The formatting was changed by the formatting and linting tools. I'm not sure if those were being applied regularly to this file... they might be doing more harm than good here.
Re: the way skipping whitespace works, it's true that this would allow that "smashed together" syntax. If that's unacceptable, we could perhaps try adding "word boundary" elements into the tokens. That would demand that there are non-word characters (whitespace being the only choice that would match the rest of the grammar) between the various tokens, without actually including those non-word characters in the token we match. As for this example:
That's actually not ambiguous, because the "skip" happens after the tokenizer runs. That is, this gets parsed as the following tokens:
then the whitespace is dropped, and finally the grammar is applied, resulting in this parse tree (ignore the line numbers, I just tacked this onto another example I was working on)
All of that said, I think we're going to go a different route for sequence templates, that doesn't need to modify the SeqN grammar. So, feel free to take this if you think it'll help with maintenance, or close it. It won't interfere with the sequence templates either way. |
Adds a skip expression for whitespace to the SeqN grammar, rather than dealing with whitespace by hand.
This should simplify the grammar, without significantly changing its meaning.
All grammar regression tests for legal SeqN examples passed unchanged.
A few examples of invalid SeqN parse slightly differently, but the new parse tree is still reasonable.
For these examples, I rebaselined the test to the new behavior.