11# Masala Parser: Javascript Parser Combinators
22
33[ ![ npm version] ( https://badge.fury.io/js/%40masala%2Fparser.svg )] ( https://badge.fury.io/js/%40masala%2Fparser )
4- [ ![ Build Status] ( https://travis-ci.org/d-plaindoux/masala-parser.svg )] ( https://travis-ci.org/d-plaindoux/masala-parser )
54[ ![ Coverage Status] ( https://coveralls.io/repos/d-plaindoux/masala-parser/badge.png?branch=master )] ( https://coveralls.io/r/d-plaindoux/masala-parser?branch=master )
65[ ![ stable] ( http://badges.github.io/stability-badges/dist/stable.svg )] ( http://github.com/badges/stability-badges )
76
8- Masala Parser is inspired by the paper titled:
7+ Masala Parser is an Open source javascript library to create your own parsers.
8+ You won't need theoretical bases on languages for many usages.
9+
10+ Masala Parser shines for ** simplicity** , ** variations** and ** maintainability**
11+ of your parsers. Typescript support and token export for AI processing will also
12+ help you in the debug process.
13+
14+ ![ absolute-demo.png] ( documentation/images/absolute-demo.png )
15+
16+ Masala Parser started in 2016 as a Javascript implementation of the Haskell
17+ ** Parsec** and is inspired by the paper titled:
918[ Direct Style Monadic Parser Combinators For The Real World] ( https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf ) .
1019
11- Masala Parser is a Javascript implementation of the Haskell ** Parsec** . It is
12- plain Javascript that works in the browser, is tested with more than 450 unit
13- tests, covering 100% of code lines.
20+ It is plain Javascript that works in the browser, is tested with more than 500
21+ unit tests, covering 100% of code lines.
1422
1523### Use cases
1624
25+ Here are the pros of Masala Parser:
26+
1727- It can create a ** full parser from scratch**
18- - It can extract data from a big text and ** replace complex regexp**
19- - It works in any ** browser**
20- - There is a good ** typescript** type declaration
21- - It can validate complete structure with ** variations**
22- - It's a great starting point for parser education. It's ** way simpler than Lex
23- & Yacc** .
24- - It's designed to be written in other languages (Python, Java, Rust) with the
25- same interface
26-
27- Masala Parser keywords are ** simplicity** , ** variations** and
28- ** maintainability** . You won't need theoretical bases on languages for
29- extraction or validation use cases.
30-
31- Masala Parser has relatively good performances, however, Javascript is obviously
32- not the fastest machine.
28+ - It can ** replace complex regexp**
29+ - It works in any ** browser** or NodeJS
30+ - There is an ** incredible typescript** api
31+ - It has some ** good performances** in speed and memory
32+ - There is zero dependency
33+ - Masala is actively supported by [ Robusta Build] ( https://www.robusta.build )
3334
3435# Usage
3536
37+ We made a ** 7 minutes** Youtube video to explain how to create a parser:
38+
39+ [ ![ Masala Parser Youtube Video] ( documentation/images/masala-yt.png )] ( https://www.youtube.com/watch?v=VNUrvWdtM2g )
40+
41+ ### Installation
42+
3643With Node Js or modern build
3744
3845 npm install -S @masala/parser
46+ yarn add @masala/parser
47+
48+ Or in the browser, using Javascript ES Modules:
3949
40- Or in the browser
50+ import {F, standard, Streams} from 'https://unpkg.com/@masala/parser@2.0.0
4151
4252- [ download Release] ( https://github.com/d-plaindoux/masala-parser/releases )
4353- ` <script src="masala-parser.min.js"/> `
4454
4555Check the [ Change Log] ( ./changelog.md ) if you can from a previous version.
4656
47- # Reference
48-
49- You will find an
50- [ Masala Parser online reference] ( http://www.robusta.io/masala-parser/ts/modules/_masala_parser_d_.html ) ,
51- generated from typescript interface.
52-
5357# Quick Examples
5458
5559## Hello World
5660
61+ Let's parse ` Hello World `
62+
5763``` js
58- const helloParser = C .string (' hello ' )
64+ const helloParser = C .string (' Hello ' )
5965const white = C .char (' ' )
60- const worldParser = C .string (' world ' )
66+ const worldParser = C .string (' World ' )
6167const combinator = helloParser .then (white .rep ()).then (worldParser)
6268```
6369
64- ## Floor notation
70+ ## Parsing tokens
71+
72+ You can parse a stream of tokens, not only characters. Let's parse a date from
73+ tokens.
6574
6675``` js
67- // N: Number Bundle, C: Chars Bundle
68- const { Streams , N , C } = require (' @masala/parser' )
69-
70- const stream = Stream .ofString (' |4.6|' )
71- const floorCombinator = C .char (' |' )
72- .drop ()
73- .then (N .number ()) // we have ['|', 4.6], we drop '|'
74- .then (C .char (' |' ).drop ()) // we have [4.6, '|'], we keep [4.6]
75- .single () // we had [4.6], now just 4.6
76- .map ((x ) => Math .floor (x))
77-
78- // The parser parses a stream of characters
79- const parsing = floorCombinator .parse (stream)
80- assertEquals (4 , parsing .value , ' Floor parsing' )
76+ import { Stream , C , F , GenLex } from ' @masala/parser'
77+
78+ const genlex = new GenLex ()
79+
80+ const [slash ] = genlex .keywords ([' /' ])
81+ // 1100 is the precedence of the token
82+ const number = genlex .tokenize (N .digits (), ' number' , 1100 )
83+
84+ let dateParser = number
85+ .then (slash .drop ())
86+ .then (number)
87+ .then (slash .drop ())
88+ .then (number)
89+ .map (([day , , month , year ]) => ({
90+ day: day,
91+ month: month,
92+ year: year,
93+ }))
8194```
8295
96+ You will then be able to combine this date parser with other parsers that use
97+ the tokens.
98+
99+ Overall, using GenLex and tokens is more efficient than using characters for
100+ complex grammars.
101+
83102## Explanations
84103
104+ We create small simple parsers, with a set of utilities (` C ` , ` N ` , ` optrep() ` ,
105+ ` map() ` , ...), then we create a more complex parser that combine them.
106+
85107According to Wikipedia _ "in functional programming, a parser combinator is a
86108higher-order function that accepts several parsers as input and returns a new
87109parser as its output."_
@@ -90,21 +112,17 @@ parser as its output."_
90112
91113Let's say we have a document :
92114
93- > > > The James Bond series, by writer Ian Fleming, focuses on a fictional
94- > > > British Secret Service agent created in 1953, who featured him in twelve
95- > > > novels and two short-story collections. Since Fleming's death in 1964,
96- > > > eight other authors have written authorised Bond novels or novelizations:
97- > > > Kingsley Amis, Christopher Wood, John Gardner, Raymond Benson, Sebastian
98- > > > Faulks, Jeffery Deaver, William Boyd and Anthony Horowitz.
115+ > The ** James Bond** series, by writer ** Ian Fleming** , focuses on a fictional
116+ > _ British_ secret service agent created in 1953.
99117
100- The parser could fetch every name, ie two consecutive words starting with
101- uppercase. The parser will read through the document and aggregate a Response,
102- which contains a value and the current offset in the text.
118+ The parser could fetch every name, defined as ** two consecutive words starting
119+ with uppercase** . The parser will read through the document and aggregate a
120+ Response, which contains a value and the current offset in the text.
103121
104122This value will evolve when the parser will meet new characters, but also with
105123some function calls, such as the ` map() ` function.
106124
107- ![ ] ( ./documentation/parsec -monoid.png )
125+ ![ Parser monoid ] ( ./documentation/parser -monoid.png )
108126
109127## The Response
110128
@@ -116,14 +134,14 @@ that represents your problem. After parsing, there are two subtypes of
116134- ` Reject ` if it could not.
117135
118136``` js
119- let response = C .char (' a' ).rep ().parse (Streams . ofChar (' aaaa' ))
137+ let response = C .char (' a' ).rep ().parse (Stream . ofChars (' aaaa' ))
120138assertEquals (response .value .join (' ' ), ' aaaa' )
121139assertEquals (response .offset , 4 )
122140assertTrue (response .isAccepted ())
123141assertTrue (response .isConsumed ())
124142
125- // Partially accepted
126- response = C .char (' a' ).rep ().parse (Streams . ofChar (' aabb' ))
143+ // Partially accepted: 'aa' is read, then it stops at offset 2
144+ response = C .char (' a' ).rep ().parse (Stream . ofChars (' aabb' ))
127145assertEquals (response .value .join (' ' ), ' aa' )
128146assertEquals (response .offset , 2 )
129147assertTrue (response .isAccepted ())
@@ -154,15 +172,15 @@ simplicity, we will use a stream of characters, which is a text :)
154172The goal is to check that we have Hello 'someone', then to grab that name
155173
156174``` js
157- // Plain old javascript
158- const { Streams , C } = require (' @masala/parser' )
175+ import { Stream , C } from ' @masala/parser'
159176
160177var helloParser = C .string (' Hello' )
161178 .then (C .char (' ' ).rep ())
162179 .then (C .letters ()) // succession of A-Za-z letters
163180 .last () // keeping previous letters
164181
165- var value = helloParser .val (' Hello Gandhi' ) // val(x) is a shortcut for parse(Stream.ofString(x)).value;
182+ // val(x) is a shortcut for: parse(Stream.ofChars(x)).value
183+ var value = helloParser .val (' Hello Gandhi' )
166184
167185assertEquals (' Gandhi' , value)
168186```
@@ -174,7 +192,7 @@ And each new Parser is a combination of Parsers given by the standard bundles or
174192previous functions.
175193
176194``` js
177- import { Streams , N , C , F } from ' @masala/parser'
195+ import { Stream , N , C , F } from ' @masala/parser'
178196
179197const blanks = () => C .char (' ' ).optrep ()
180198
@@ -212,7 +230,7 @@ function combinator() {
212230}
213231
214232function parseOperation (line ) {
215- return combinator ().parse (Streams . ofString (line))
233+ return combinator ().parse (Stream . ofChars (line))
216234}
217235
218236assertEquals (4 , parseOperation (' 2 +2' ).value , ' sum: ' )
@@ -244,16 +262,16 @@ We will give priority to sum, then multiplication, then scalar. If we had put
244262` +2 ` alone ? It's not a valid sum ! Moreover ` +2 ` and ` -2 ` are acceptable
245263scalars.
246264
247- ## try(x).or(y)
265+ ## Backtracking with the parser: try(x).or(y)
248266
249- ` or() ` will often be used with ` try() ` , that makes
250- [ backtracking] ( https://en.wikipedia.org/wiki/Backtracking ) : it saves the
251- current offset, then tries an option. And as soon that it's not satisfied, it
252- goes back to the original offset and use the parser inside the ` .or(P) `
253- expression.`.
267+ Take a look at 2+2 and 2* 2. These two operations * start with the same\*
268+ character ` 2 ` ! The parser may try one operation and fail. Often, you will want
269+ to go back to the initial offset and try another operation : That mechanism is
270+ called [ backtracking] ( https://en.wikipedia.org/wiki/Backtracking ) .
254271
255- Like Haskell's Parsec, Masala Parser can parse infinite look-ahead grammars but
256- performs best on predictive (LL[ 1] ) grammars.
272+ ` try(x).or(y) ` tries the first option, and enable it saves the current offset,
273+ then tries an option. And as soon that it's not satisfied, it goes back to the
274+ original offset and use the parser inside the ` .or(P) ` expression.`.
257275
258276Let see how with ` try() ` , we can look a bit ahead of next characters, then go
259277back:
@@ -272,18 +290,29 @@ Suppose we do not `try()` but use `or()` directly:
272290 Is it (multiplication())? No ;
273291 or(scalar()) ? neither
274292
293+ We have the same problem with pure text. Let's parse ` monday ` or ` money `
294+
295+ const parser = C.string('monday').or('money')
296+ const result = parser.val('money')
297+ ^will stop ready `monday` at `e`
298+
299+ The result will be undefined, because the parser will not find ` monday ` neither
300+ ` money ` . The good parser is:
301+
302+ const parser = F.try(C.string('monday')).or('money')
303+
304+ When failing reading ` monday ` , the parser will come back to ` m `
305+
275306# Recursion
276307
277308Masala-Parser (like Parsec) is a top-down parser and doesn't like
278309[ Left Recursion] ( https://cs.stackexchange.com/a/9971 ) .
279310
280- However, it is a resolved problem for this kind of parsers, with a lot of
281- documentation. You can read more on
311+ However, it is a resolved problem for this kind of parsers. You can read more on
282312[ recursion with Masala] ( ./documentation/recursion.md ) , and checkout examples on
283313our Github repository (
284- [ simple recursion] ( https://github.com/d-plaindoux/masala-parser/blob/master/integration-npm/examples/recursion/aaab-lazy-recursion.js ) ,
285- or
286- [ calculous expressions] ( https://github.com/d-plaindoux/masala-parser/blob/master/integration-npm/examples/operations/plus-minus.js )
314+ [ simple recursion] ( integration-ts/examples/lazy/transmission.spec.ts ) , or
315+ [ calculous expressions] ( integration-ts/examples/operations/plus-minus.spec.ts )
287316).
288317
289318# Simple documentation of Core bundles
@@ -403,3 +432,8 @@ PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
403432You should have received a copy of the GNU Lesser General Public License along
404433with this program; see the file COPYING. If not, write to the Free Software
405434Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
435+
436+ ## Support
437+
438+ Masala Parser is maintained by [ Robusta Build] ( https://www.robusta.build ) .
439+ Contact us for professional support, consulting, training or custom development.
0 commit comments