[gen] Introduce a new relaxation/edge/atom parser in diy tool.#1685
[gen] Introduce a new relaxation/edge/atom parser in diy tool.#1685ShaleXIONG wants to merge 35 commits intoherd:masterfrom
diy tool.#1685Conversation
ce28381 to
15b8202
Compare
15b8202 to
19da623
Compare
19da623 to
509b397
Compare
6c6b505 to
1e82287
Compare
7d4368a to
7fc9943
Compare
7fc9943 to
4261325
Compare
eb895bd to
e6c2d70
Compare
There was a problem hiding this comment.
Thanks for this @ShaleXIONG . I've left a few comments in the code, and I have two overarching concerns:
- The lexer uses global mutable state, in a way that I think is quite error prone. I'm also not convinced it's really necessary. At the very least, I'd like to see it replaced with local state. See the comments in the code.
- This PR introduces a new parser for existing syntax, and new user-facing syntax for relaxations. The parsing semantics of both the old and new syntax is not trivial (for example, white space is interpreted in a context-dependent way, and
,has different meaning indiy7vsdiycross7). For this reason, I think we need significantly more tests than what the PR currently offers. Again, see the comments for additional details.
Happy to have a chat online/offline about any of these points.
5319f8e to
867880b
Compare
@fsestini I have addressed all your comments in several new commits. Can you have a look? Once we settle, I will merge and rewrite the history. |
| (* Track whether the current scope has already seen an operand. | ||
| Whitespace after an operand is treated as a sequence separator. *) |
There was a problem hiding this comment.
Sorry, but the comment still mentions 'operand', so my point about it not being clear what 'operand' means still applies.
| (* Track whether the current scope has already seen an operand. | |
| Whitespace after an operand is treated as a sequence separator. *) | |
| (* Track whether the lexer is currently inside a square bracket pair and has consumed a relaxation string. | |
| This is because whitespace after a relaxation within brackets is treated as a sequence separator. *) |
| @@ -0,0 +1,4 @@ | |||
| An explicit comma stays a sequence after a choice in `diy7` | |||
There was a problem hiding this comment.
I don’t think this has been fully addressed quite yet. The new gen/tests/test_parser.ml script is a great start, but it only tests the generic parser/AST layer. It checks Parser.main + AST expansion, but not the actual tool-specific parsing semantics.
As I mentioned in my comment, I think we should also have OCaml tests for the parsing paths used by the different diy* tools, since they do not all interpret the parsed AST in the same way. In particular, I would like to see direct OCaml tests for:
diy7’s top-level sequence-as-choice behavior, and semantics of multiple-relaxand-safeoptionsdiyone7’s requirement that the input expand to exactly one cyclediycross7’s interpretation of "comma,as choice", and semantics of multi-argument combinations likediycross7 A 'B|C' D- possibly some additional tests covering:
- fence and
-cumulparsing - the newly-introduced invalid-relax filter
- fence and
Again, it might be the case that gen/diy.ml, gen/diyone.ml, and gen/diycross.ml need a bit of extra work so that the corresponding parsing logic can be exposed through small functions that can be tested directly from OCaml tests.
| "Parser syntax: whitespace or ',' for sequence, '|' for choice, '?' for optional, and '[...]' for grouping, for examples:\n\ | ||
| - 'A B' means the sequence A,B.\n\ | ||
| - 'A|B,C' and '[A|B] C' both mean the choice between A and B, then the sequence with C.\n\ | ||
| - '[A,B]?' means either the group '[A,B]' or the empty '[]'.\n\ | ||
| - 'A|B,C|[D,E]?' parses as '(A|B),(C|([D,E]?))'.\n\ | ||
| Depending on the tool and context, a sequence may be interpreted either as a 'followed-by' relation between relaxations or as a choice between inputs." |
There was a problem hiding this comment.
I find the new "Parser syntax" help text rather parser-/grammar-centric, and IMO not that helpful from a user perspective. I also think this documentation text should be tailored per tool, and should describe the accepted input forms in user-facing terms rather than in terms of "parser syntax" (which users shouldn't really be concerned with) or "sequences".
The term "sequence" in particular doesn't seem to be well-defined from the user's point of view: is the "sequence" A,B denoting a composite relaxation? It is some other kind of list? Is it a choice/alternative between two relaxations? I'm not super convinced that this help message provides sufficient answer to these questions.
In particular, users need to understand things like:
- how to write a composite relaxation
- how and when to write alternative candidate relaxations with
| - what
?applies to and what is its semantics - how top-level whitespace and
,is interpreted in eachdiy*tool
As written, the current text feels too generic, and the list of parsing examples does not seem very informative. I think the examples should be written in the form of concrete commands diy7 -relax "A B... that show how these operators behave concretely.
There was a problem hiding this comment.
I will update based on different tools and give example in different tool after the common parser wording.
For diyone7 and diycross7 the explanation is at the top. For diy7 the explanation is at the argument level since only -safe, -relax etc accept relaxation input.
… seq_to_choice function
d5d116e to
35df659
Compare
|
Summary of offline discussion to move this PR forward. @ShaleXIONG let me know if anything is incorrect or missing.
The following points expand and further clarify my previous comment, which I think it's still not fully addressed yet.
|
@fsestini. Lexer is now simplified and complexity has been moved to parser. The parser now have three entry point:
The three parsers now have dedicated test cases. We also add extra test case for |
a670da9 to
c66512f
Compare
c66512f to
dff3d4e
Compare
This is a first phrase to rework on the
diy*lexer and parser on input relaxations or cycles. In this pull request we introduce a more conventional way, that is,lexutil.mlltokenises input to an AST, specifically typetin newast.mlandparser.mlyparses the AST to internal data structure.In the
Ast.t, individual primitive relaxation is represented byOne of stringfor individual relaxationSeq of t listfor a sequence of relaxations. This corresponds to,syntax or in some situation, white space. For examplediyone7 PosWR PosRR Friis equivalent todiyone7 PosWR,PosRR,Fri.Choice of t list, together with a new syntax|. For examplePosWR|DpAddrdWmeans eitherPosWRorDpAddrdW.Opt of t, together with a new syntax?. For examplePosWR?means eitherPosWRor(empty).The new constructors can be used in
diycross7anddiy7. Indiyone7, although it will be parsed however an error occurs due to input is not a precise one cycle. An example isdiy7 -arch AArch64 -safe "[Po|DMB.SY*** DpAddr?] Coe" -cycleonly true -size 4 -nprocs 2, which gives:We can see
2+2W013: PodWW Coe DMB.SYdWR DpAddrsW Coewhich generate fromPo,Coe[DMB.SY*** DpAddr]. One can also run the command with-v, where the fully unfold edges will print at very beginning.Last, Given all the chances above, we unify the paring process cross three
diy*tools. Previously differentdiy*has different parsing path and the actual code are in different files such asdiy.ml,diycross.mlanddiyone.ml. Now the main passing function unifies atparse_expand_relaxandparse_expand_relaxsinrelax.ml.Some future plan:
Ppotype which is only used inPPCCompile_gen.ml, and make thePpoas a special wildcard.Ast.tin place. The current parsing is after we flatten theAst.ttostring list list, so we can hook to the existing parsing functions.We decide to leave it as future plan for smaller change in this pull request and role out the new syntax as soon.