Skip to content

[gen] Introduce a new relaxation/edge/atom parser in diy tool.#1685

Open
ShaleXIONG wants to merge 35 commits intoherd:masterfrom
ShaleXIONG:code-better-parser
Open

[gen] Introduce a new relaxation/edge/atom parser in diy tool.#1685
ShaleXIONG wants to merge 35 commits intoherd:masterfrom
ShaleXIONG:code-better-parser

Conversation

@ShaleXIONG
Copy link
Copy Markdown
Collaborator

@ShaleXIONG ShaleXIONG commented Jan 27, 2026

This is a first phrase to rework on the diy* lexer and parser on input relaxations or cycles. In this pull request we introduce a more conventional way, that is,

  • update lexutil.mll tokenises input to an AST, specifically type t in new ast.ml and
  • new parser.mly parses the AST to internal data structure.

In the Ast.t, individual primitive relaxation is represented by

  • One of string for individual relaxation
  • Seq of t list for a sequence of relaxations. This corresponds to , syntax or in some situation, white space. For example diyone7 PosWR PosRR Fri is equivalent to diyone7 PosWR,PosRR,Fri.
  • A new choice constructor Choice of t list, together with a new syntax |. For example PosWR|DpAddrdW means either PosWR or DpAddrdW.
  • A new option constructor Opt of t, together with a new syntax ?. For example PosWR? means either PosWR or (empty).

The new constructors can be used in diycross7 and diy7. In diyone7, although it will be parsed however an error occurs due to input is not a precise one cycle. An example is diy7 -arch AArch64 -safe "[Po|DMB.SY*** DpAddr?] Coe" -cycleonly true -size 4 -nprocs 2, which gives:

Generator produced 10 tests
2+2W000: PosWR DpAddrdW Coe PosWR DpAddrdW Coe
2+2W001: PodWW Coe PosWR DpAddrdW Coe
2+2W002: PosWR DpAddrdW Coe PodWR DpAddrsW Coe
2+2W003: PosWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W004: DMB.SYsWR DpAddrdW Coe PosWR DpAddrdW Coe
2+2W005: DMB.SYdWW Coe PosWR DpAddrdW Coe
2+2W006: DMB.SYdWR DpAddrsW Coe PosWR DpAddrdW Coe
2+2W007: DMB.SYdWR DpAddrdW Coe PosWR DpAddrdW Coe
2+2W008: PodWW Coe PodWW Coe
2+2W009: PodWW Coe PodWR DpAddrsW Coe
2+2W010: PodWW Coe PodWR DpAddrdW Coe
2+2W011: PodWW Coe DMB.SYsWR DpAddrdW Coe
2+2W012: DMB.SYdWW Coe PodWW Coe
2+2W013: PodWW Coe DMB.SYdWR DpAddrsW Coe
2+2W014: PodWW Coe DMB.SYdWR DpAddrdW Coe
2+2W015: PodWR DpAddrsW Coe PodWR DpAddrsW Coe
2+2W016: PodWR DpAddrsW Coe PodWR DpAddrdW Coe
2+2W017: DMB.SYsWR DpAddrdW Coe PodWR DpAddrsW Coe
2+2W018: DMB.SYdWW Coe PodWR DpAddrsW Coe
2+2W019: DMB.SYdWR DpAddrsW Coe PodWR DpAddrsW Coe
2+2W020: DMB.SYdWR DpAddrdW Coe PodWR DpAddrsW Coe
2+2W021: PodWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W022: DMB.SYsWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W023: DMB.SYdWW Coe PodWR DpAddrdW Coe
2+2W024: DMB.SYdWR DpAddrsW Coe PodWR DpAddrdW Coe
2+2W025: DMB.SYdWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W026: DMB.SYsWR DpAddrdW Coe DMB.SYsWR DpAddrdW Coe
2+2W027: DMB.SYdWW Coe DMB.SYsWR DpAddrdW Coe
2+2W028: DMB.SYsWR DpAddrdW Coe DMB.SYdWR DpAddrsW Coe
2+2W029: DMB.SYsWR DpAddrdW Coe DMB.SYdWR DpAddrdW Coe
2+2W030: DMB.SYdWW Coe DMB.SYdWW Coe
2+2W031: DMB.SYdWW Coe DMB.SYdWR DpAddrsW Coe
2+2W032: DMB.SYdWW Coe DMB.SYdWR DpAddrdW Coe
2+2W033: DMB.SYdWR DpAddrsW Coe DMB.SYdWR DpAddrsW Coe
2+2W034: DMB.SYdWR DpAddrsW Coe DMB.SYdWR DpAddrdW Coe
2+2W035: DMB.SYdWR DpAddrdW Coe DMB.SYdWR DpAddrdW Coe

We can see 2+2W013: PodWW Coe DMB.SYdWR DpAddrsW Coe which generate from Po, Coe [DMB.SY*** DpAddr]. One can also run the command with -v, where the fully unfold edges will print at very beginning.

Last, Given all the chances above, we unify the paring process cross three diy* tools. Previously different diy* has different parsing path and the actual code are in different files such as diy.ml, diycross.ml and diyone.ml. Now the main passing function unifies at parse_expand_relax and parse_expand_relaxs in relax.ml.

Some future plan:

  • remove the Ppo type which is only used in PPCCompile_gen.ml, and make the Ppo as a special wildcard.
  • Parse the Ast.t in place. The current parsing is after we flatten the Ast.t to string list list, so we can hook to the existing parsing functions.

We decide to leave it as future plan for smaller change in this pull request and role out the new syntax as soon.

@ShaleXIONG ShaleXIONG requested a review from fsestini January 27, 2026 12:56
@ShaleXIONG ShaleXIONG force-pushed the code-better-parser branch 4 times, most recently from ce28381 to 15b8202 Compare January 29, 2026 09:54
@ShaleXIONG ShaleXIONG marked this pull request as ready for review January 29, 2026 10:14
@ShaleXIONG ShaleXIONG force-pushed the code-better-parser branch 3 times, most recently from 6c6b505 to 1e82287 Compare March 4, 2026 16:50
@ShaleXIONG ShaleXIONG force-pushed the code-better-parser branch 5 times, most recently from 7d4368a to 7fc9943 Compare March 26, 2026 12:29
@ShaleXIONG ShaleXIONG force-pushed the code-better-parser branch 6 times, most recently from eb895bd to e6c2d70 Compare April 10, 2026 08:57
Copy link
Copy Markdown
Collaborator

@fsestini fsestini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @ShaleXIONG . I've left a few comments in the code, and I have two overarching concerns:

  • The lexer uses global mutable state, in a way that I think is quite error prone. I'm also not convinced it's really necessary. At the very least, I'd like to see it replaced with local state. See the comments in the code.
  • This PR introduces a new parser for existing syntax, and new user-facing syntax for relaxations. The parsing semantics of both the old and new syntax is not trivial (for example, white space is interpreted in a context-dependent way, and , has different meaning in diy7 vs diycross7). For this reason, I think we need significantly more tests than what the PR currently offers. Again, see the comments for additional details.

Happy to have a chat online/offline about any of these points.

Comment thread gen/common/ast.ml
Comment thread gen/common/ast.ml Outdated
Comment thread gen/common/ast.ml Outdated
Comment thread gen/common/ast.ml Outdated
Comment thread gen/common/ast.ml Outdated
Comment thread gen/relax.ml Outdated
Comment thread gen/relax.ml Outdated
Comment thread gen/relax.ml
Comment thread gen/relax.ml Outdated
Comment thread gen/norm.ml Outdated
@ShaleXIONG ShaleXIONG force-pushed the code-better-parser branch 2 times, most recently from 5319f8e to 867880b Compare April 13, 2026 15:38
@ShaleXIONG
Copy link
Copy Markdown
Collaborator Author

ShaleXIONG commented Apr 14, 2026

Thanks for this @ShaleXIONG . I've left a few comments in the code, and I have two overarching concerns:

  • The lexer uses global mutable state, in a way that I think is quite error prone. I'm also not convinced it's really necessary. At the very least, I'd like to see it replaced with local state. See the comments in the code.
  • This PR introduces a new parser for existing syntax, and new user-facing syntax for relaxations. The parsing semantics of both the old and new syntax is not trivial (for example, white space is interpreted in a context-dependent way, and , has different meaning in diy7 vs diycross7). For this reason, I think we need significantly more tests than what the PR currently offers. Again, see the comments for additional details.

Happy to have a chat online/offline about any of these points.

@fsestini I have addressed all your comments in several new commits. Can you have a look? Once we settle, I will merge and rewrite the history.

@fsestini fsestini self-requested a review April 15, 2026 12:08
Comment thread gen/common/lexUtil.mll Outdated
Comment thread gen/common/lexUtil.mll Outdated
Comment on lines +20 to +21
(* Track whether the current scope has already seen an operand.
Whitespace after an operand is treated as a sequence separator. *)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but the comment still mentions 'operand', so my point about it not being clear what 'operand' means still applies.

Suggested change
(* Track whether the current scope has already seen an operand.
Whitespace after an operand is treated as a sequence separator. *)
(* Track whether the lexer is currently inside a square bracket pair and has consumed a relaxation string.
This is because whitespace after a relaxation within brackets is treated as a sequence separator. *)

Comment thread gen/common/parser.mly Outdated
Comment thread gen/diy.ml Outdated
Comment thread gen/tests/diy7-check.t Outdated
@@ -0,0 +1,4 @@
An explicit comma stays a sequence after a choice in `diy7`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this has been fully addressed quite yet. The new gen/tests/test_parser.ml script is a great start, but it only tests the generic parser/AST layer. It checks Parser.main + AST expansion, but not the actual tool-specific parsing semantics.

As I mentioned in my comment, I think we should also have OCaml tests for the parsing paths used by the different diy* tools, since they do not all interpret the parsed AST in the same way. In particular, I would like to see direct OCaml tests for:

  • diy7’s top-level sequence-as-choice behavior, and semantics of multiple -relax and -safe options
  • diyone7’s requirement that the input expand to exactly one cycle
  • diycross7’s interpretation of "comma , as choice", and semantics of multi-argument combinations like diycross7 A 'B|C' D
  • possibly some additional tests covering:
    • fence and -cumul parsing
    • the newly-introduced invalid-relax filter

Again, it might be the case that gen/diy.ml, gen/diyone.ml, and gen/diycross.ml need a bit of extra work so that the corresponding parsing logic can be exposed through small functions that can be tested directly from OCaml tests.

Comment thread gen/common/ast.mli Outdated
Comment thread gen/common/config.ml Outdated
Comment on lines +145 to +150
"Parser syntax: whitespace or ',' for sequence, '|' for choice, '?' for optional, and '[...]' for grouping, for examples:\n\
- 'A B' means the sequence A,B.\n\
- 'A|B,C' and '[A|B] C' both mean the choice between A and B, then the sequence with C.\n\
- '[A,B]?' means either the group '[A,B]' or the empty '[]'.\n\
- 'A|B,C|[D,E]?' parses as '(A|B),(C|([D,E]?))'.\n\
Depending on the tool and context, a sequence may be interpreted either as a 'followed-by' relation between relaxations or as a choice between inputs."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the new "Parser syntax" help text rather parser-/grammar-centric, and IMO not that helpful from a user perspective. I also think this documentation text should be tailored per tool, and should describe the accepted input forms in user-facing terms rather than in terms of "parser syntax" (which users shouldn't really be concerned with) or "sequences".
The term "sequence" in particular doesn't seem to be well-defined from the user's point of view: is the "sequence" A,B denoting a composite relaxation? It is some other kind of list? Is it a choice/alternative between two relaxations? I'm not super convinced that this help message provides sufficient answer to these questions.

In particular, users need to understand things like:

  • how to write a composite relaxation
  • how and when to write alternative candidate relaxations with |
  • what ? applies to and what is its semantics
  • how top-level whitespace and , is interpreted in each diy* tool

As written, the current text feels too generic, and the list of parsing examples does not seem very informative. I think the examples should be written in the form of concrete commands diy7 -relax "A B... that show how these operators behave concretely.

Copy link
Copy Markdown
Collaborator Author

@ShaleXIONG ShaleXIONG Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update based on different tools and give example in different tool after the common parser wording.

For diyone7 and diycross7 the explanation is at the top. For diy7 the explanation is at the argument level since only -safe, -relax etc accept relaxation input.

Comment thread gen/common/edge.ml Outdated
Comment thread gen/common/ast.mli Outdated
Comment thread gen/diy.ml
@fsestini
Copy link
Copy Markdown
Collaborator

fsestini commented Apr 22, 2026

Summary of offline discussion to move this PR forward. @ShaleXIONG let me know if anything is incorrect or missing.

  • We should aim to simplify the lexer. In particular, I don't think it should include branching logic for "top-level" vs "backward-compatible" modes, or emit dedicated "top-level" tokens. IMO, it should be responsibility of the parser to determine top-level and tool-specific semantics.

The following points expand and further clarify my previous comment, which I think it's still not fully addressed yet.

  • The parser should expose dedicated entry points for the different tool-level semantics (diy7, diyone7, diycross7, etc.). These entry points can share common grammar pieces, but the tools do have genuinely different parsing semantics, and I think that should be reflected more clearly in the grammar and parsers.

  • These tool-specific parsing functions should be tested with dedicated OCaml unit tests, rather than mainly through broad end-to-end tests as proposed by the PR. The current approach validates parser behavior indirectly via integration/regression tests (e.g. diycross-syntax in Makefile), which I think are not very effective to test parsing behaviour specifically, since:

    • they don't have sufficient failure locality: a test failure could originate from a bug in any of: parsing, expansion, cycle generation, pretty-printing, herd7, etc., or the test infrastructure itself. Thus it may be very difficult to trace a test failure to the corresponding parser bug, or to determine that such failure is caused by a parsing bug in the first place.
    • conversely, because these tests cover a long pipeline of steps at once, a parser bug could end up being masked by later stages and not result in any detectable test failure.

    IMO parsing should be tested directly because this PR touches on subtle, tool-specific parsing semantics, and those are more effectively verified by testing the parsing functions in isolation, without interference from later steps.

  • On of the main points of difference between tools is how they interpret the , symbol. E.g. it has the meaning of "disjunctive choice" in diycross7 and "sequence" in diyone7. This PR affects this behaviour, so the test suite should aim to provide enough confidence that backwards compatibility is not broken. IMO we are not quite there yet, for example there are no tests checking how diyone7 interprets , (which is different from how diycross7 interprets it). We should take a second look at the test suite and add any missing cases. Moreover, we should double check the outcome of the tests against diy* tools pre-PR.

  • We should remove the grep step from the -filter-check test cases, as the test is more clear without it.

@ShaleXIONG
Copy link
Copy Markdown
Collaborator Author

Summary of offline discussion to move this PR forward. @ShaleXIONG let me know if anything is incorrect or missing.

  • We should aim to simplify the lexer. In particular, I don't think it should include branching logic for "top-level" vs "backward-compatible" modes, or emit dedicated "top-level" tokens. IMO, it should be responsibility of the parser to determine top-level and tool-specific semantics.

The following points expand and further clarify my previous comment, which I think it's still not fully addressed yet.

  • The parser should expose dedicated entry points for the different tool-level semantics (diy7, diyone7, diycross7, etc.). These entry points can share common grammar pieces, but the tools do have genuinely different parsing semantics, and I think that should be reflected more clearly in the grammar and parsers.

  • These tool-specific parsing functions should be tested with dedicated OCaml unit tests, rather than mainly through broad end-to-end tests as proposed by the PR. The current approach validates parser behavior indirectly via integration/regression tests (e.g. diycross-syntax in Makefile), which I think are not very effective to test parsing behaviour specifically, since:

    • they don't have sufficient failure locality: a test failure could originate from a bug in any of: parsing, expansion, cycle generation, pretty-printing, herd7, etc., or the test infrastructure itself. Thus it may be very difficult to trace a test failure to the corresponding parser bug, or to determine that such failure is caused by a parsing bug in the first place.
    • conversely, because these tests cover a long pipeline of steps at once, a parser bug could end up being masked by later stages and not result in any detectable test failure.

    IMO parsing should be tested directly because this PR touches on subtle, tool-specific parsing semantics, and those are more effectively verified by testing the parsing functions in isolation, without interference from later steps.

  • On of the main points of difference between tools is how they interpret the , symbol. E.g. it has the meaning of "disjunctive choice" in diycross7 and "sequence" in diyone7. This PR affects this behaviour, so the test suite should aim to provide enough confidence that backwards compatibility is not broken. IMO we are not quite there yet, for example there are no tests checking how diyone7 interprets , (which is different from how diycross7 interprets it). We should take a second look at the test suite and add any missing cases. Moreover, we should double check the outcome of the tests against diy* tools pre-PR.

  • We should remove the grep step from the -filter-check test cases, as the test is more clear without it.

@fsestini. Lexer is now simplified and complexity has been moved to parser. The parser now have three entry point:

  • main, parsing as expected without special manipulation for diyone7
  • main_top_level_choice, parsing the top level , and whitespace as Choice.
  • cumul, where all , whitespace and | are treated as Choice. This is for diy7 -cumul {INPUT}. The parser matches previous behaviours where the first level of square bracket is simply ignored, while nest square bracket is forbidden. It is better to expand the behaviour to allowed nested square bracket and ignore them in the parser.

The three parsers now have dedicated test cases. We also add extra test case for remove_invalid_relaxes. This means all the -filter-check test segments have been removed since it is unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants