[gen] Introduce a new relaxation/edge/atom parser in `diy` tool. by ShaleXIONG · Pull Request #1685 · herd/herdtools7

ShaleXIONG · 2026-01-27T12:56:59Z

This is a first phrase to rework on the diy* lexer and parser on input relaxations or cycles. In this pull request we introduce a more conventional way, that is,

update lexutil.mll tokenises input to an AST, specifically type t in new ast.ml and
new parser.mly parses the AST to internal data structure.

In the Ast.t, individual primitive relaxation is represented by

One of string for individual relaxation
Seq of t list for a sequence of relaxations. This corresponds to , syntax or in some situation, white space. For example diyone7 PosWR PosRR Fri is equivalent to diyone7 PosWR,PosRR,Fri.
A new choice constructor Choice of t list, together with a new syntax |. For example PosWR|DpAddrdW means either PosWR or DpAddrdW.
A new option constructor Opt of t, together with a new syntax ?. For example PosWR? means either PosWR or (empty).

The new constructors can be used in diycross7 and diy7. In diyone7, although it will be parsed however an error occurs due to input is not a precise one cycle. An example is diy7 -arch AArch64 -safe "[Po|DMB.SY*** DpAddr?] Coe" -cycleonly true -size 4 -nprocs 2, which gives:

Generator produced 10 tests
2+2W000: PosWR DpAddrdW Coe PosWR DpAddrdW Coe
2+2W001: PodWW Coe PosWR DpAddrdW Coe
2+2W002: PosWR DpAddrdW Coe PodWR DpAddrsW Coe
2+2W003: PosWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W004: DMB.SYsWR DpAddrdW Coe PosWR DpAddrdW Coe
2+2W005: DMB.SYdWW Coe PosWR DpAddrdW Coe
2+2W006: DMB.SYdWR DpAddrsW Coe PosWR DpAddrdW Coe
2+2W007: DMB.SYdWR DpAddrdW Coe PosWR DpAddrdW Coe
2+2W008: PodWW Coe PodWW Coe
2+2W009: PodWW Coe PodWR DpAddrsW Coe
2+2W010: PodWW Coe PodWR DpAddrdW Coe
2+2W011: PodWW Coe DMB.SYsWR DpAddrdW Coe
2+2W012: DMB.SYdWW Coe PodWW Coe
2+2W013: PodWW Coe DMB.SYdWR DpAddrsW Coe
2+2W014: PodWW Coe DMB.SYdWR DpAddrdW Coe
2+2W015: PodWR DpAddrsW Coe PodWR DpAddrsW Coe
2+2W016: PodWR DpAddrsW Coe PodWR DpAddrdW Coe
2+2W017: DMB.SYsWR DpAddrdW Coe PodWR DpAddrsW Coe
2+2W018: DMB.SYdWW Coe PodWR DpAddrsW Coe
2+2W019: DMB.SYdWR DpAddrsW Coe PodWR DpAddrsW Coe
2+2W020: DMB.SYdWR DpAddrdW Coe PodWR DpAddrsW Coe
2+2W021: PodWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W022: DMB.SYsWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W023: DMB.SYdWW Coe PodWR DpAddrdW Coe
2+2W024: DMB.SYdWR DpAddrsW Coe PodWR DpAddrdW Coe
2+2W025: DMB.SYdWR DpAddrdW Coe PodWR DpAddrdW Coe
2+2W026: DMB.SYsWR DpAddrdW Coe DMB.SYsWR DpAddrdW Coe
2+2W027: DMB.SYdWW Coe DMB.SYsWR DpAddrdW Coe
2+2W028: DMB.SYsWR DpAddrdW Coe DMB.SYdWR DpAddrsW Coe
2+2W029: DMB.SYsWR DpAddrdW Coe DMB.SYdWR DpAddrdW Coe
2+2W030: DMB.SYdWW Coe DMB.SYdWW Coe
2+2W031: DMB.SYdWW Coe DMB.SYdWR DpAddrsW Coe
2+2W032: DMB.SYdWW Coe DMB.SYdWR DpAddrdW Coe
2+2W033: DMB.SYdWR DpAddrsW Coe DMB.SYdWR DpAddrsW Coe
2+2W034: DMB.SYdWR DpAddrsW Coe DMB.SYdWR DpAddrdW Coe
2+2W035: DMB.SYdWR DpAddrdW Coe DMB.SYdWR DpAddrdW Coe

We can see 2+2W013: PodWW Coe DMB.SYdWR DpAddrsW Coe which generate from Po, Coe [DMB.SY*** DpAddr]. One can also run the command with -v, where the fully unfold edges will print at very beginning.

Last, Given all the chances above, we unify the paring process cross three diy* tools. Previously different diy* has different parsing path and the actual code are in different files such as diy.ml, diycross.ml and diyone.ml. Now the main passing function unifies at parse_expand_relax and parse_expand_relaxs in relax.ml.

Some future plan:

remove the Ppo type which is only used in PPCCompile_gen.ml, and make the Ppo as a special wildcard.
Parse the Ast.t in place. The current parsing is after we flatten the Ast.t to string list list, so we can hook to the existing parsing functions.

We decide to leave it as future plan for smaller change in this pull request and role out the new syntax as soon.

fsestini

Thanks for this @ShaleXIONG . I've left a few comments in the code, and I have two overarching concerns:

The lexer uses global mutable state, in a way that I think is quite error prone. I'm also not convinced it's really necessary. At the very least, I'd like to see it replaced with local state. See the comments in the code.
This PR introduces a new parser for existing syntax, and new user-facing syntax for relaxations. The parsing semantics of both the old and new syntax is not trivial (for example, white space is interpreted in a context-dependent way, and , has different meaning in diy7 vs diycross7). For this reason, I think we need significantly more tests than what the PR currently offers. Again, see the comments for additional details.

Happy to have a chat online/offline about any of these points.

ShaleXIONG · 2026-04-14T14:20:59Z

Thanks for this @ShaleXIONG . I've left a few comments in the code, and I have two overarching concerns:

The lexer uses global mutable state, in a way that I think is quite error prone. I'm also not convinced it's really necessary. At the very least, I'd like to see it replaced with local state. See the comments in the code.

This PR introduces a new parser for existing syntax, and new user-facing syntax for relaxations. The parsing semantics of both the old and new syntax is not trivial (for example, white space is interpreted in a context-dependent way, and , has different meaning in diy7 vs diycross7). For this reason, I think we need significantly more tests than what the PR currently offers. Again, see the comments for additional details.

Happy to have a chat online/offline about any of these points.

@fsestini I have addressed all your comments in several new commits. Can you have a look? Once we settle, I will merge and rewrite the history.

fsestini · 2026-04-15T12:16:36Z

+(* Track whether the current scope has already seen an operand.
+   Whitespace after an operand is treated as a sequence separator. *)


Sorry, but the comment still mentions 'operand', so my point about it not being clear what 'operand' means still applies.

Suggested change

(* Track whether the current scope has already seen an operand.

Whitespace after an operand is treated as a sequence separator. *)

(* Track whether the lexer is currently inside a square bracket pair and has consumed a relaxation string.

This is because whitespace after a relaxation within brackets is treated as a sequence separator. *)

fsestini · 2026-04-15T12:45:06Z

@@ -0,0 +1,4 @@
+An explicit comma stays a sequence after a choice in `diy7`


I don’t think this has been fully addressed quite yet. The new gen/tests/test_parser.ml script is a great start, but it only tests the generic parser/AST layer. It checks Parser.main + AST expansion, but not the actual tool-specific parsing semantics.

As I mentioned in my comment, I think we should also have OCaml tests for the parsing paths used by the different diy* tools, since they do not all interpret the parsed AST in the same way. In particular, I would like to see direct OCaml tests for:

diy7’s top-level sequence-as-choice behavior, and semantics of multiple -relax and -safe options

diyone7’s requirement that the input expand to exactly one cycle

diycross7’s interpretation of "comma , as choice", and semantics of multi-argument combinations like diycross7 A 'B|C' D

possibly some additional tests covering:

fence and -cumul parsing

the newly-introduced invalid-relax filter

Again, it might be the case that gen/diy.ml, gen/diyone.ml, and gen/diycross.ml need a bit of extra work so that the corresponding parsing logic can be exposed through small functions that can be tested directly from OCaml tests.

fsestini · 2026-04-15T13:48:44Z

+  "Parser syntax: whitespace or ',' for sequence, '|' for choice, '?' for optional, and '[...]' for grouping, for examples:\n\
+    - 'A B' means the sequence A,B.\n\
+    - 'A|B,C' and '[A|B] C' both mean the choice between A and B, then the sequence with C.\n\
+    - '[A,B]?' means either the group '[A,B]' or the empty '[]'.\n\
+    - 'A|B,C|[D,E]?' parses as '(A|B),(C|([D,E]?))'.\n\
+   Depending on the tool and context, a sequence may be interpreted either as a 'followed-by' relation between relaxations or as a choice between inputs."


I find the new "Parser syntax" help text rather parser-/grammar-centric, and IMO not that helpful from a user perspective. I also think this documentation text should be tailored per tool, and should describe the accepted input forms in user-facing terms rather than in terms of "parser syntax" (which users shouldn't really be concerned with) or "sequences".
The term "sequence" in particular doesn't seem to be well-defined from the user's point of view: is the "sequence" A,B denoting a composite relaxation? It is some other kind of list? Is it a choice/alternative between two relaxations? I'm not super convinced that this help message provides sufficient answer to these questions.

In particular, users need to understand things like:

how to write a composite relaxation

how and when to write alternative candidate relaxations with |

what ? applies to and what is its semantics

how top-level whitespace and , is interpreted in each diy* tool

As written, the current text feels too generic, and the list of parsing examples does not seem very informative. I think the examples should be written in the form of concrete commands diy7 -relax "A B... that show how these operators behave concretely.

I will update based on different tools and give example in different tool after the common parser wording.

For diyone7 and diycross7 the explanation is at the top. For diy7 the explanation is at the argument level since only -safe, -relax etc accept relaxation input.

… seq_to_choice function

fsestini · 2026-04-22T13:02:02Z

Summary of offline discussion to move this PR forward. @ShaleXIONG let me know if anything is incorrect or missing.

We should aim to simplify the lexer. In particular, I don't think it should include branching logic for "top-level" vs "backward-compatible" modes, or emit dedicated "top-level" tokens. IMO, it should be responsibility of the parser to determine top-level and tool-specific semantics.

The following points expand and further clarify my previous comment, which I think it's still not fully addressed yet.

The parser should expose dedicated entry points for the different tool-level semantics (diy7, diyone7, diycross7, etc.). These entry points can share common grammar pieces, but the tools do have genuinely different parsing semantics, and I think that should be reflected more clearly in the grammar and parsers.
These tool-specific parsing functions should be tested with dedicated OCaml unit tests, rather than mainly through broad end-to-end tests as proposed by the PR. The current approach validates parser behavior indirectly via integration/regression tests (e.g. diycross-syntax in Makefile), which I think are not very effective to test parsing behaviour specifically, since:
- they don't have sufficient failure locality: a test failure could originate from a bug in any of: parsing, expansion, cycle generation, pretty-printing, herd7, etc., or the test infrastructure itself. Thus it may be very difficult to trace a test failure to the corresponding parser bug, or to determine that such failure is caused by a parsing bug in the first place.
- conversely, because these tests cover a long pipeline of steps at once, a parser bug could end up being masked by later stages and not result in any detectable test failure.
IMO parsing should be tested directly because this PR touches on subtle, tool-specific parsing semantics, and those are more effectively verified by testing the parsing functions in isolation, without interference from later steps.
On of the main points of difference between tools is how they interpret the , symbol. E.g. it has the meaning of "disjunctive choice" in diycross7 and "sequence" in diyone7. This PR affects this behaviour, so the test suite should aim to provide enough confidence that backwards compatibility is not broken. IMO we are not quite there yet, for example there are no tests checking how diyone7 interprets , (which is different from how diycross7 interprets it). We should take a second look at the test suite and add any missing cases. Moreover, we should double check the outcome of the tests against diy* tools pre-PR.
We should remove the grep step from the -filter-check test cases, as the test is more clear without it.

ShaleXIONG · 2026-04-23T11:38:33Z

Summary of offline discussion to move this PR forward. @ShaleXIONG let me know if anything is incorrect or missing.

We should aim to simplify the lexer. In particular, I don't think it should include branching logic for "top-level" vs "backward-compatible" modes, or emit dedicated "top-level" tokens. IMO, it should be responsibility of the parser to determine top-level and tool-specific semantics.

The following points expand and further clarify my previous comment, which I think it's still not fully addressed yet.

The parser should expose dedicated entry points for the different tool-level semantics (diy7, diyone7, diycross7, etc.). These entry points can share common grammar pieces, but the tools do have genuinely different parsing semantics, and I think that should be reflected more clearly in the grammar and parsers.

These tool-specific parsing functions should be tested with dedicated OCaml unit tests, rather than mainly through broad end-to-end tests as proposed by the PR. The current approach validates parser behavior indirectly via integration/regression tests (e.g. diycross-syntax in Makefile), which I think are not very effective to test parsing behaviour specifically, since:

they don't have sufficient failure locality: a test failure could originate from a bug in any of: parsing, expansion, cycle generation, pretty-printing, herd7, etc., or the test infrastructure itself. Thus it may be very difficult to trace a test failure to the corresponding parser bug, or to determine that such failure is caused by a parsing bug in the first place.

conversely, because these tests cover a long pipeline of steps at once, a parser bug could end up being masked by later stages and not result in any detectable test failure.

IMO parsing should be tested directly because this PR touches on subtle, tool-specific parsing semantics, and those are more effectively verified by testing the parsing functions in isolation, without interference from later steps.

On of the main points of difference between tools is how they interpret the , symbol. E.g. it has the meaning of "disjunctive choice" in diycross7 and "sequence" in diyone7. This PR affects this behaviour, so the test suite should aim to provide enough confidence that backwards compatibility is not broken. IMO we are not quite there yet, for example there are no tests checking how diyone7 interprets , (which is different from how diycross7 interprets it). We should take a second look at the test suite and add any missing cases. Moreover, we should double check the outcome of the tests against diy* tools pre-PR.

We should remove the grep step from the -filter-check test cases, as the test is more clear without it.

@fsestini. Lexer is now simplified and complexity has been moved to parser. The parser now have three entry point:

main, parsing as expected without special manipulation for diyone7
main_top_level_choice, parsing the top level , and whitespace as Choice.
cumul, where all , whitespace and | are treated as Choice. This is for diy7 -cumul {INPUT}. The parser matches previous behaviours where the first level of square bracket is simply ignored, while nest square bracket is forbidden. It is better to expand the behaviour to allowed nested square bracket and ignore them in the parser.

The three parsers now have dedicated test cases. We also add extra test case for remove_invalid_relaxes. This means all the -filter-check test segments have been removed since it is unnecessary.

ShaleXIONG requested a review from fsestini January 27, 2026 12:56

ShaleXIONG force-pushed the code-better-parser branch 4 times, most recently from ce28381 to 15b8202 Compare January 29, 2026 09:54

ShaleXIONG marked this pull request as ready for review January 29, 2026 10:14

ShaleXIONG force-pushed the code-better-parser branch from 15b8202 to 19da623 Compare February 9, 2026 14:43

ShaleXIONG force-pushed the code-better-parser branch from 19da623 to 509b397 Compare February 18, 2026 10:48

ShaleXIONG mentioned this pull request Mar 2, 2026

[WIP] [gen] Update baseline configuration for rmw #1578

Draft

ShaleXIONG force-pushed the code-better-parser branch 3 times, most recently from 6c6b505 to 1e82287 Compare March 4, 2026 16:50

ShaleXIONG force-pushed the code-better-parser branch 5 times, most recently from 7d4368a to 7fc9943 Compare March 26, 2026 12:29

ShaleXIONG force-pushed the code-better-parser branch from 7fc9943 to 4261325 Compare March 30, 2026 08:56

ShaleXIONG force-pushed the code-better-parser branch 6 times, most recently from eb895bd to e6c2d70 Compare April 10, 2026 08:57

fsestini reviewed Apr 10, 2026

View reviewed changes

ShaleXIONG force-pushed the code-better-parser branch 2 times, most recently from 5319f8e to 867880b Compare April 13, 2026 15:38

fsestini self-requested a review April 15, 2026 12:08

fsestini reviewed Apr 15, 2026

View reviewed changes

ShaleXIONG added 19 commits April 17, 2026 13:22

[gen] update the parser for cumul argument.

acabe0b

[gen] address comment in unify parser gen/

106c620

[gen] use `Ast parsing infrastructure in LogRelax.ml.

5b6d162

[gen] Remove unused function in Ast.

16082bb

[gen] Add parser explanation text in the helper.

e10024b

[gen] address commment in new parser.

eadde27

[gen] helper update

968cf80

[gen] update the comments in ast.ml

0892d7b

[gen] Update the ast.expand in the parsing process in gen/

d102a6b

[gen] udpate the test case.

a1abf2e

[gen] remove ( fun .. ) and ( function .. ).

e032d86

[gen] update bind to the conventional typing in monadic bind.

2bcdda5

[gen] Add backward compatibility in lexutil.mll and remove remove the…

e2d2855

… seq_to_choice function

[gen] remove seq_to_choice in unify parsing in gen/

f11c7d0

[gen] update the test for backward compatibility

aa17d9e

[gen] update helper.

39c2dcb

[gen] remove warning in norm.ml

a4ed93a

[gen] remove mk_seq and mk_choice and introduce normalise.

d847b36

Update test. (HISTORY REWRITE in code-better-parser-stash)

35df659

ShaleXIONG force-pushed the code-better-parser branch from d5d116e to 35df659 Compare April 17, 2026 13:24

[gen] remove normalise from the interface.

b73374f

ShaleXIONG added 3 commits April 22, 2026 17:53

Separate top level parser.

a638a52

Update the parsing in gen/

f46ebf8

[test] update parser tests.

e869a2b

ShaleXIONG force-pushed the code-better-parser branch from a670da9 to c66512f Compare April 23, 2026 11:38

ShaleXIONG added 2 commits April 23, 2026 12:46

[gen] Expose AutoArch as a library so we can use it in testing.

d0e165f

Test remove invalid input.

dff3d4e

ShaleXIONG force-pushed the code-better-parser branch from c66512f to dff3d4e Compare April 23, 2026 11:46

		(* Track whether the current scope has already seen an operand.
		Whitespace after an operand is treated as a sequence separator. *)

		@@ -0,0 +1,4 @@
		An explicit comma stays a sequence after a choice in `diy7`

Conversation

ShaleXIONG commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fsestini left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShaleXIONG commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fsestini Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fsestini Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fsestini Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

ShaleXIONG Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fsestini commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShaleXIONG commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShaleXIONG commented Jan 27, 2026 •

edited

Loading

fsestini left a comment •

edited

Loading

ShaleXIONG commented Apr 14, 2026 •

edited

Loading

ShaleXIONG Apr 16, 2026 •

edited

Loading

fsestini commented Apr 22, 2026 •

edited

Loading