Skip to content

Railroad, Syntax diagrams, EBNF

Nikolay Rozhkov edited this page Sep 23, 2023 · 4 revisions

This is a collection of what was discussed here

Which standard to use

We have a couple of standards and different notations, so we should probably support as many as we can at once

support expressions sample

Alternatives and tools

Ready image builder https://www.npmjs.com/package/railroad-diagrams

DrawGrammar Limitations

It does not perform left-reduction

RR Limitations

In RR 'a'+? and 'a'?+ are effectively reduced to 'a'*, and ('a'|)+ reduced to ('a'|)+ but not to the 'a'*, although they are effectively the same. It seems that RR needs like 2 steps for that reduction.

It also cannot compress a|a* although it obviously equals to a+

Normalizing or compressing tree

RR approach

Author of RR suggest to reduce AST down to two operators.

  • alternative
  • one or more

See the longer example:

rule ::= 'a'? 'b'+ 'c'* 'd'?+ 'e'?* 'f'+* ('g'|) ('h'|)+ ('i'|)* ('w'|)+* 'x' y 'z'

becomes

[rule]::= 'a'? 'b'+ 'c'* 'd'* 'e'* 'f'* 'g'? 'h'?+ 'i'?* 'w'?* 'x' [y] 'z'

DB structure

Instead of parsing it down to a concatenation operator

graph
c1[,] --> A
c2[,] --> B
c2[,] --> C

c1 --> c2
Loading

I would rather use

graph
c1[,]
c1 --> A
c1 --> B
c1 --> C
Loading

Concepts

I have not come to a conclusion yet, there are some cases in this implementation that are really tricky, but some, that are relatively simple are not processed.

Currently I am trying to figure out, should "one or many" be a node or a "transition" just like empty transition.

May be it'll make something simpler, may be not. In that case we probably could use only alternative, and definition representing "list of consequent terms and non terms".

%e - epsilon, empty transition from the start state to the end state %r - repetition, empty transition from the end to the start state

x or 1 - only one x? or x|%e - zero or one x+ or x|%r - one or many x* or x|%e|%r - zero or many

Compression Rules

uncompressed => compressed

We can represent quantifiers as alternatives, or as alternatives with repetitions

a? => a | %e
a+ => a | %r
a* => a | %e | %r

Two quantifiers one after another does not meed being 'greedy' (like in regexp)

(a?)? => a | %e | %e => a?
(a?)+ => a | %e | %r) => a*
(a?)* => a | %e | %e | %r => a?

(a+)? => a | %r | %e => a*
(a+)+ => a | %r | %r => a+
(a+)* => a | %r | %r | %e => a*

// because a* contains %e and %r it always produces a*
a*? => a*
a*+ => a*
a** => a*

Alternatives can unite its elements

a | a? => a | a | %e => a?
a | a+ => a | a | %r => a+
a | a* => a | a | %e | %r => a*

Concatenation can unite its elements single element + quantifier

a a? a => a a? a
a a* a => a a+
a a+ a => a a a+

quantifier + quantifier

a? a? => a{0,2} # we do not support this
a? a? => a? a?
a? a+ => a? a+
a? a* => (a|%e) (a|%e)* => a*
a* a* => (a|%e)* (a|%e)* => a*
a+ a+ => a a* a a* => a a+
a+ a* => a*
a* a+ => a*

Unwrap and unite alternatives

(a) => a
a|(b|c) => a|b|c

Common prefixes

TODO

Recursion and reduction

We need to

  • eliminate recursion if possible
  • apply left-reduction, search for common prefixes and suffixes in long rules e.g. a b c d | a b becomes a b (c d)?
rule: 'a' rule | %e =>  rule: a*
rule ::= 'a' rule | 'a' => rule: a+

Examples

alt ::= a(b a)*

alt

alt ::= a b c d | b | c d 

alt

Clone this wiki locally