Grime is a two-dimensional pattern matching language based on Boolean grammars. The basic idea is to construct rectangular patterns from smaller components, and check whether they are found in the input matrix.
You can try Grime online, courtesy of Dennis from PPCG. Be sure to check the tutorial too.
A Grime program is called a grammar, and the correct file extension for a grammar is .gr, although this is not enforced.
The program is called like
runhaskell grime.hs [options] grammarfile matrixfile
where matrixfile is a file containing the character matrix.
For example, the digits program would be called like
runhaskell grime.hs digits.gr digit-matrix
If you want a speed boost, you should compile Grime with the command
ghc -O2 grime.hs
and then run the resulting executable:
grime [options] grammarfile matrixfile
There's also a bash script rungrime that compiles Grime if needed and then calls the executable with the provided arguments.
By default, the interpreter prints the first match it finds, but this can be controlled using the option flags:
-e: match only the entire matrix, print1for match and0for no match.-n: print the number of matches, or the entire matrix if-eis also given.-a: print all matches.-p: print also the positions of the matches, in the format(x,y,w,h).-s: do not print the matches themselves.-b: include borders of width 1.-dor-d0: print debug information.-d1: print logs from the matcher.
These can be inserted from the command line, or in the beginning of any line of a grammar file, separated from the rest of the line by a backtick `.
A Grime grammar consists of one or more definitions, each on a separate line.
Each of them defines the value of a nonterminal, and one of them must define the anonymous toplevel nonterminal.
The syntax of a definition is either N=E or E, where N is an uppercase letter and E is an expression.
Expressions are constructed as follows.
- Any character escaped with
\matches any1x1rectangle containing that character. .matches any single character.bmatches a1x1rectangle outside the character matrix. It's only relevant if the-boption is used, or if the input is not rectangular.ematches a 0-height or 0-width segment of the input's edge.$always matches.fmatches any 0-height (flat) rectangle, andtmatches any 0-width (thin) rectangle.- The pre-defined character groups are
digit,uppercase,lowercase,alphabetic, alphanumeric, andsymbol. - New character classes can be defined by the syntax
[a-prt-w,d-gu]. The letters on the left are included, and those on the right are excluded, so this matches exactly the lettersabchijklmnoprtvw. If the left side is empty, it is taken to contain all characters and the border. The comma can be omitted if the right side is empty. The characters[],-\must be escaped with\.\bmatches a border. - An unescaped uppercase letter is a nonterminal, and matches the expression it is assigned.
_is the anonymous toplevel nonterminal.- If
PandQare expressions, thenPQis just their horizontal concatenation, andP/Qis their vertical concatenation, withPon top. P Qis likePQ, but with lower precedence than/. This sometimes saves parentheses.P+is one or morePs aligned horizontally, andP/+is the same aligned vertically.- The Boolean operations are denoted
P|Q,P&QandP!. Exclusive OR isP~Q, and difference isP-Q. P?is shorthand fort|P,P/?forf|P,P*fort|P+, andP/*forf|P/+.P{a-b,c-d}, whereabcdare nonnegative integers, is a size constraint onP. IfPis a character class, then the expression matches anymxnrectangle containing only those characters, provided thatmis betweenaandbinclusive, andnis betweencanddinclusive. In other cases, the expression matches any rectangle that has the correct size and thatPalso matches. Ifaorcare omitted, they are taken to be0, and ifbordare omitted, they are infinite. If the hyphen betweenaandbis omitted, then we use the same number for both ends of the interval. If the entirec-dpart is omitted, both axes are constrained. To clarify,{-b}is equivalent to{0-b,0-b}, and{a-,c}is equivalent to{a-infinity,c-c}. The}can be omitted.P:a-b,c-d}is a grid specifier. It matches a rectangle that can be divided into anmxngrid of matches ofP, wheremis betweenaandbinclusive, andnis betweencanddinclusive. The ranges and}can be omitted, with the same defaulting behavior as in the size constraint, except that the beginning of a range defaults to1instead of0.P#a-b}is a counting specifier. It matches a rectangle that contains betweenaandbpossibly overlapping matches ofP, inclusive. The}is optional,adefaults to1andbto infinity.PoS, whereSis a (greedily parsed) string of the characters01234567, optionally terminated with}, is an orientation modifier. Each of the characters stands for a rotation (0123) or reflection (4567), which is applied to the expressionP. The transformed expressions are combined with Boolean disjunctions. The charactersOXNTKHADCencode convenient classes of orientations. The characterFfixes an expression, so that its orientation cannot be modified.<P>is a context bracket. It matches any rectangle r that's contained in a larger rectangle that matchesP. The rectangle r can be matched by a digit inP; this is0by default, and increases with the nesting depth (0matches the anchor of the innermost brackets,1the next one etc).
You can add the prefix ^ to any infix operator or chain of postfix operators to raise its precedence higher than all other operators, or v to lower it; for example, \a\b^|\cv+ is parsed as (\a(\b|\c))+.
If i is any infix operator apart from concatenation or /, and p a chain of postfix operators, then PipQ is a shorthand for (PiQ)p; for example, \a|+\b is equivalent to (\a|\b)+.
Double quotes "" are used for swapping the \-escaped status of all characters inside them, except for the characters "\/.
They allow text grids like "this/is a/grid".
Non-literal syntax elements, like nonterminals, operations and parentheses, can be used in quoted expressions by escaping them.
Note that quotes define syntax elements, so "ab/cd"+/E will be parsed as (\a\b/\c\d)+/E.
All open parentheses, brackets and quotes are implicitly closed at the end of a line.
Lines beginning with | are comments, and the parser ignores them.
Grime does allow paradoxical definitions of the form A=A!; these will result in no match.
In general, the matching engine can resolve any grammar where no match of a single nonterminal on a single rectangle depends on itself (although for complex grammars, this can take a while).
A nontrivial example of a resolvable grammar is
A=E?\a
B=[ab]+&A!
E=A|B
a`A
Given a one-dimensional string of as and bs, this grammar will match every substring that ends in a.
Note how the nonterminal A is self-referential in more than one way (via E directly, and via E through B), but since the E in A must be a proper substring, the system can always be resolved.
Grime supports non-rectangular inputs; the rows are simply aligned to the left, and the gaps can be matched using b.