PC is a minimal zero-dependency parser combinator framework enabling intuitive and modular parser development.
A parser as we refer to it here is a function with the signature
(input: string) => [offset, matches]
Where offset indicates how far into input we were able to convert into matches.
PC provides four fundamental parsers:
stringfor matching exact strings (e.g."hi" === ["hi"])regexpfor matching character ranges (e.g./hi?/ === ["h", "hi"])sequencefor matching ordered patterns of parsers (i.e. all patterns must match, one after the other)anyfor matching any number of patterns in any order (i.e. at least one pattern must match)
Both the string and regexp parser can be created with the match parser,
which is just a convenience function which maps your argument (a string or
RegExp) to the string or regexp parser.
All parsers in PC have the following signature:
(input: string) => [offset: number, matches: string[] | string | null]Where input is the remaining input to be parsed, offset is the length of input
consumed or matched by the parser and matches is an array of strings or single
string (signifying a successful match) or null (signifying no match). See the
Types section for more detail.
npm i @tmanderson/pc
const { match: m, sequence: s, any: a } = require('@tmanderson/pc');
// Helper for patterns matching once and only once
const m11 = p => m(p, 1, 1);
// Special Characters
const CBO = m11('{')
const CBC = m11('}')
const HBO = m11('[')
const HBC = m11(']')
const COL = m11(':')
const COM = m11(',')
const QOT = m11('"')
const TRU = m11('true')
const FLS = m11('false')
const INT = m11(/[0-9]/)
const ALP = m11(/[a-zA-Z0-9]/)
const DOT = m11('.')
const CHA = m11(/[^"]/)
// Optional Whitespace
const WSP = m(/[\n\s\t ]/, 0)
// "Primitives"
const BOO = a([ TRU, FLS ], 1, 1);
const STR = s([ QOT, m(i => CHA(i), 0), QOT ], 1, 1);
const NUM = s([ INT, s([ DOT, INT ], 0) ]);
// Arrays (ENT = array-entry)
const ENT = s([ WSP, i => TYP(i), WSP ])
const ARR = s([ HBO, s([ ENT, s([ COM, ENT ], 0) ], 0), HBC ]);
// Objects (KAV = key-and-value)
const KAV = s([ WSP, a([ STR, ALP ]), WSP, COL, WSP, i => TYP(i), WSP ]);
const OBJ = s([ CBO, s([ KAV, s([ COM, KAV ], 0) ], 0), CBC ]);
// Value types
const TYP = a([ STR, NUM, BOO, OBJ, ARR ]);
// Root
const JSON = a([ ARR, OBJ ], 0, 1);
JSON('{}')
JSON('[]')
JSON('{ test: true }')
JSON('{ "test": [1, "two", true, {}] }')All PC parsers take a single argument (an input string) and return a MatcherResult.
This makes interstitial operations (within the parsing context) a matter of defining
a function with this input/ouput signature. Within that function you can manipulate
input, output, the parser offset and/or the outputs of other parsers called within
the function itself.
A common use-case of this might be in the concatenation of consecutive string
matches. For example, the parser match('a') would, given the input 'aaab',
return ['a', 'a', 'a'] which can become daunting when reading through your parser
output. It would be better if the output were ['aaa']. We can resolve this issue
by creating a concat utility for our simple parser:
const SimpleParser = match('a');
SimpleParser('aaab') // => [ 3, [ 'a', 'a', 'a' ] ]
const concat = (input) => {
// SimpleParser returns a PrimitiveMatch [number, string]
const [inputOffset, matches] = SimpleParser(input);
// if `matches` is null, this implies no matches (so inputOffset is 0)
if (matches === null) return [0, null];
// Otherwise return the same offset (we're not reducing/consuming extra input)
// and concatenate all the matches from AlphaN
return [inputOffset, matches.join('')]
}
concat('aaab') // => [ 3, [ 'aaa' ] ]If you're one for concision, this function can be greatly minimized with an IIFE:
const concat = (input) =>
(([inputOffset, matches]) =>
[inputOffset, matches ? matches.join('') : null])(SimpleParser(input))The match parser takes a pattern. If pattern is a RegExp remember that
it will only match against a single character of input at a time (because the
length of a match is assumed intentionally indeterminate).
match('wow')('wow') // => [3, 'wow']
match('wow')('wowwow') // => [6, ['wow', 'wow']]
match('wow')('wowow') // => [3, 'wow']
match(/[wo]/)('wo') // => [2, ['w' ,'o']]
match(/[wo]/)('wowww') // => [5, ['w', 'o', 'w', 'w', 'w']]The sequence parser takes an ordered array of Matchers, returning an array
of tokens, each entry pertaining to the match specified within the patterns.
sequence([
match('w'),
match('o'),
match('w')
])('wow') // => [ 3, [ [ ['w'], ['o'], ['w'] ] ] ]
sequence([
match(/[0-9]/, 3),
match('-'),
match(/[0-9]/, 3),
match('-'),
match(/[0-9]/, 4),
])('123-456-7890') /* =>
[ 12, [
[
[ '1', '2', '3' ],
[ '-' ],
[ '4', '5', '6' ],
[ '-' ],
[ '7', '8', '9', '0' ]
]
]
] */The any parser takes an unordered array of Matchers, returning an array
of tokens, each entry pertaining to any match within the patterns.
any([
match(/[0-9]/),
match(/[a-z ]/),
match('Jodabalocky'),
])('Jodabalocky is 77') // =>
// [ 17, [ [ 'jodabalocky' ], [ ' ', 'i', 's', ' ' ], [ '7', '7' ] ] ]All parsers utilized by PC require an output of MatcherResult. The following
breaks down the definition a bit more:
type NoMatch = null;
type Match = string;
type Matches = string[];
type MatcherResult = [offset: number, matches: MatchGroup | Match | NoMatch];The offset value represents the total number of input characters consumed by
the parser while the second argument represents the matches made by it. If matches
returns null this indicates that the input was not successfully parsed.