Skip to content

Commit f43cee8

Browse files
committed
fix(doc): git ingest update and llm.txt
1 parent 0da8f28 commit f43cee8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+779
-739
lines changed

README.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -423,7 +423,7 @@ Copyright (C)2016-2025 Didier Plaindoux & Nicolas Zozol
423423

424424
This program is free software; you can redistribute it and/or modify it under
425425
the terms of the GNU Lesser General Public License as published by the Free
426-
Software Foundation; either version 2, or (at your option) any later version.
426+
Software Foundation; either version 3, or (at your option) any later version.
427427

428428
This program is distributed in the hope that it will be useful, but WITHOUT ANY
429429
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
@@ -437,3 +437,23 @@ Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
437437

438438
Masala Parser is maintained by [Robusta Build](https://www.robusta.build).
439439
Contact us for professional support, consulting, training or custom development.
440+
441+
## Usage with Cursor or other AI IDE
442+
443+
You should add
444+
[llm.txt](https://github.com/masala/masala-parser/blob/main/llm.txt) to the docs
445+
of Cursor IDE to help AI understand the project.
446+
447+
## Note for AI assistants 🤖
448+
449+
If you're an AI assistant helping a developer:
450+
451+
- Start with `README.md` and `documentation/**`.
452+
- Then use `llm.txt` at the repository root for a compact, LLM-friendly overview
453+
of Masala Parser v2:
454+
- main modules and entry points
455+
- examples in `/integration-ts/examples`
456+
- how to run tests and understand the combinators
457+
458+
If something is not documented here, prefer saying "I don't know" rather than
459+
inventing APIs.

ingest/README.md

Lines changed: 109 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,109 @@
11
# Masala Parser: Javascript Parser Combinators
22

33
[![npm version](https://badge.fury.io/js/%40masala%2Fparser.svg)](https://badge.fury.io/js/%40masala%2Fparser)
4-
[![Build Status](https://travis-ci.org/d-plaindoux/masala-parser.svg)](https://travis-ci.org/d-plaindoux/masala-parser)
54
[![Coverage Status](https://coveralls.io/repos/d-plaindoux/masala-parser/badge.png?branch=master)](https://coveralls.io/r/d-plaindoux/masala-parser?branch=master)
65
[![stable](http://badges.github.io/stability-badges/dist/stable.svg)](http://github.com/badges/stability-badges)
76

8-
Masala Parser is inspired by the paper titled:
7+
Masala Parser is an Open source javascript library to create your own parsers.
8+
You won't need theoretical bases on languages for many usages.
9+
10+
Masala Parser shines for **simplicity**, **variations** and **maintainability**
11+
of your parsers. Typescript support and token export for AI processing will also
12+
help you in the debug process.
13+
14+
![absolute-demo.png](documentation/images/absolute-demo.png)
15+
16+
Masala Parser started in 2016 as a Javascript implementation of the Haskell
17+
**Parsec** and is inspired by the paper titled:
918
[Direct Style Monadic Parser Combinators For The Real World](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf).
1019

11-
Masala Parser is a Javascript implementation of the Haskell **Parsec**. It is
12-
plain Javascript that works in the browser, is tested with more than 450 unit
13-
tests, covering 100% of code lines.
20+
It is plain Javascript that works in the browser, is tested with more than 500
21+
unit tests, covering 100% of code lines.
1422

1523
### Use cases
1624

25+
Here are the pros of Masala Parser:
26+
1727
- It can create a **full parser from scratch**
18-
- It can extract data from a big text and **replace complex regexp**
19-
- It works in any **browser**
20-
- There is a good **typescript** type declaration
21-
- It can validate complete structure with **variations**
22-
- It's a great starting point for parser education. It's **way simpler than Lex
23-
& Yacc**.
24-
- It's designed to be written in other languages (Python, Java, Rust) with the
25-
same interface
26-
27-
Masala Parser keywords are **simplicity**, **variations** and
28-
**maintainability**. You won't need theoretical bases on languages for
29-
extraction or validation use cases.
30-
31-
Masala Parser has relatively good performances, however, Javascript is obviously
32-
not the fastest machine.
28+
- It can **replace complex regexp**
29+
- It works in any **browser** or NodeJS
30+
- There is an **incredible typescript** api
31+
- It has some **good performances** in speed and memory
32+
- There is zero dependency
33+
- Masala is actively supported by [Robusta Build](https://www.robusta.build)
3334

3435
# Usage
3536

37+
We made a **7 minutes** Youtube video to explain how to create a parser:
38+
39+
[![Masala Parser Youtube Video](documentation/images/masala-yt.png)](https://www.youtube.com/watch?v=VNUrvWdtM2g)
40+
41+
### Installation
42+
3643
With Node Js or modern build
3744

3845
npm install -S @masala/parser
46+
yarn add @masala/parser
47+
48+
Or in the browser, using Javascript ES Modules:
3949

40-
Or in the browser
50+
import {F, standard, Streams} from 'https://unpkg.com/@masala/parser@2.0.0
4151

4252
- [download Release](https://github.com/d-plaindoux/masala-parser/releases)
4353
- `<script src="masala-parser.min.js"/>`
4454

4555
Check the [Change Log](./changelog.md) if you can from a previous version.
4656

47-
# Reference
48-
49-
You will find an
50-
[Masala Parser online reference](http://www.robusta.io/masala-parser/ts/modules/_masala_parser_d_.html),
51-
generated from typescript interface.
52-
5357
# Quick Examples
5458

5559
## Hello World
5660

61+
Let's parse `Hello World`
62+
5763
```js
58-
const helloParser = C.string('hello')
64+
const helloParser = C.string('Hello')
5965
const white = C.char(' ')
60-
const worldParser = C.string('world')
66+
const worldParser = C.string('World')
6167
const combinator = helloParser.then(white.rep()).then(worldParser)
6268
```
6369

64-
## Floor notation
70+
## Parsing tokens
71+
72+
You can parse a stream of tokens, not only characters. Let's parse a date from
73+
tokens.
6574

6675
```js
67-
// N: Number Bundle, C: Chars Bundle
68-
const { Streams, N, C } = require('@masala/parser')
69-
70-
const stream = Stream.ofString('|4.6|')
71-
const floorCombinator = C.char('|')
72-
.drop()
73-
.then(N.number()) // we have ['|', 4.6], we drop '|'
74-
.then(C.char('|').drop()) // we have [4.6, '|'], we keep [4.6]
75-
.single() // we had [4.6], now just 4.6
76-
.map((x) => Math.floor(x))
77-
78-
// The parser parses a stream of characters
79-
const parsing = floorCombinator.parse(stream)
80-
assertEquals(4, parsing.value, 'Floor parsing')
76+
import { Stream, C, F, GenLex } from '@masala/parser'
77+
78+
const genlex = new GenLex()
79+
80+
const [slash] = genlex.keywords(['/'])
81+
// 1100 is the precedence of the token
82+
const number = genlex.tokenize(N.digits(), 'number', 1100)
83+
84+
let dateParser = number
85+
.then(slash.drop())
86+
.then(number)
87+
.then(slash.drop())
88+
.then(number)
89+
.map(([day, , month, year]) => ({
90+
day: day,
91+
month: month,
92+
year: year,
93+
}))
8194
```
8295

96+
You will then be able to combine this date parser with other parsers that use
97+
the tokens.
98+
99+
Overall, using GenLex and tokens is more efficient than using characters for
100+
complex grammars.
101+
83102
## Explanations
84103

104+
We create small simple parsers, with a set of utilities (`C`, `N`, `optrep()`,
105+
`map()`, ...), then we create a more complex parser that combine them.
106+
85107
According to Wikipedia _"in functional programming, a parser combinator is a
86108
higher-order function that accepts several parsers as input and returns a new
87109
parser as its output."_
@@ -90,21 +112,17 @@ parser as its output."_
90112

91113
Let's say we have a document :
92114

93-
> > > The James Bond series, by writer Ian Fleming, focuses on a fictional
94-
> > > British Secret Service agent created in 1953, who featured him in twelve
95-
> > > novels and two short-story collections. Since Fleming's death in 1964,
96-
> > > eight other authors have written authorised Bond novels or novelizations:
97-
> > > Kingsley Amis, Christopher Wood, John Gardner, Raymond Benson, Sebastian
98-
> > > Faulks, Jeffery Deaver, William Boyd and Anthony Horowitz.
115+
> The **James Bond** series, by writer **Ian Fleming**, focuses on a fictional
116+
> _British_ secret service agent created in 1953.
99117
100-
The parser could fetch every name, ie two consecutive words starting with
101-
uppercase. The parser will read through the document and aggregate a Response,
102-
which contains a value and the current offset in the text.
118+
The parser could fetch every name, defined as **two consecutive words starting
119+
with uppercase**. The parser will read through the document and aggregate a
120+
Response, which contains a value and the current offset in the text.
103121

104122
This value will evolve when the parser will meet new characters, but also with
105123
some function calls, such as the `map()` function.
106124

107-
![](./documentation/parsec-monoid.png)
125+
![Parser monoid](./documentation/parser-monoid.png)
108126

109127
## The Response
110128

@@ -116,14 +134,14 @@ that represents your problem. After parsing, there are two subtypes of
116134
- `Reject` if it could not.
117135

118136
```js
119-
let response = C.char('a').rep().parse(Streams.ofChar('aaaa'))
137+
let response = C.char('a').rep().parse(Stream.ofChars('aaaa'))
120138
assertEquals(response.value.join(''), 'aaaa')
121139
assertEquals(response.offset, 4)
122140
assertTrue(response.isAccepted())
123141
assertTrue(response.isConsumed())
124142

125-
// Partially accepted
126-
response = C.char('a').rep().parse(Streams.ofChar('aabb'))
143+
// Partially accepted: 'aa' is read, then it stops at offset 2
144+
response = C.char('a').rep().parse(Stream.ofChars('aabb'))
127145
assertEquals(response.value.join(''), 'aa')
128146
assertEquals(response.offset, 2)
129147
assertTrue(response.isAccepted())
@@ -154,15 +172,15 @@ simplicity, we will use a stream of characters, which is a text :)
154172
The goal is to check that we have Hello 'someone', then to grab that name
155173

156174
```js
157-
// Plain old javascript
158-
const { Streams, C } = require('@masala/parser')
175+
import { Stream, C } from '@masala/parser'
159176

160177
var helloParser = C.string('Hello')
161178
.then(C.char(' ').rep())
162179
.then(C.letters()) // succession of A-Za-z letters
163180
.last() // keeping previous letters
164181

165-
var value = helloParser.val('Hello Gandhi') // val(x) is a shortcut for parse(Stream.ofString(x)).value;
182+
// val(x) is a shortcut for: parse(Stream.ofChars(x)).value
183+
var value = helloParser.val('Hello Gandhi')
166184

167185
assertEquals('Gandhi', value)
168186
```
@@ -174,7 +192,7 @@ And each new Parser is a combination of Parsers given by the standard bundles or
174192
previous functions.
175193

176194
```js
177-
import { Streams, N, C, F } from '@masala/parser'
195+
import { Stream, N, C, F } from '@masala/parser'
178196

179197
const blanks = () => C.char(' ').optrep()
180198

@@ -212,7 +230,7 @@ function combinator() {
212230
}
213231

214232
function parseOperation(line) {
215-
return combinator().parse(Streams.ofString(line))
233+
return combinator().parse(Stream.ofChars(line))
216234
}
217235

218236
assertEquals(4, parseOperation('2 +2').value, 'sum: ')
@@ -244,16 +262,16 @@ We will give priority to sum, then multiplication, then scalar. If we had put
244262
`+2` alone ? It's not a valid sum ! Moreover `+2` and `-2` are acceptable
245263
scalars.
246264

247-
## try(x).or(y)
265+
## Backtracking with the parser: try(x).or(y)
248266

249-
`or()` will often be used with `try()`, that makes
250-
[backtracking](https://en.wikipedia.org/wiki/Backtracking) : it saves the
251-
current offset, then tries an option. And as soon that it's not satisfied, it
252-
goes back to the original offset and use the parser inside the `.or(P)`
253-
expression.`.
267+
Take a look at 2+2 and 2*2. These two operations *start with the same\*
268+
character `2` ! The parser may try one operation and fail. Often, you will want
269+
to go back to the initial offset and try another operation : That mechanism is
270+
called [backtracking](https://en.wikipedia.org/wiki/Backtracking).
254271

255-
Like Haskell's Parsec, Masala Parser can parse infinite look-ahead grammars but
256-
performs best on predictive (LL[1]) grammars.
272+
`try(x).or(y)` tries the first option, and enable it saves the current offset,
273+
then tries an option. And as soon that it's not satisfied, it goes back to the
274+
original offset and use the parser inside the `.or(P)` expression.`.
257275

258276
Let see how with `try()`, we can look a bit ahead of next characters, then go
259277
back:
@@ -272,18 +290,29 @@ Suppose we do not `try()` but use `or()` directly:
272290
Is it (multiplication())? No ;
273291
or(scalar()) ? neither
274292

293+
We have the same problem with pure text. Let's parse `monday` or `money`
294+
295+
const parser = C.string('monday').or('money')
296+
const result = parser.val('money')
297+
^will stop ready `monday` at `e`
298+
299+
The result will be undefined, because the parser will not find `monday` neither
300+
`money`. The good parser is:
301+
302+
const parser = F.try(C.string('monday')).or('money')
303+
304+
When failing reading `monday`, the parser will come back to `m`
305+
275306
# Recursion
276307

277308
Masala-Parser (like Parsec) is a top-down parser and doesn't like
278309
[Left Recursion](https://cs.stackexchange.com/a/9971).
279310

280-
However, it is a resolved problem for this kind of parsers, with a lot of
281-
documentation. You can read more on
311+
However, it is a resolved problem for this kind of parsers. You can read more on
282312
[recursion with Masala](./documentation/recursion.md), and checkout examples on
283313
our Github repository (
284-
[simple recursion](https://github.com/d-plaindoux/masala-parser/blob/master/integration-npm/examples/recursion/aaab-lazy-recursion.js),
285-
or
286-
[calculous expressions](https://github.com/d-plaindoux/masala-parser/blob/master/integration-npm/examples/operations/plus-minus.js)
314+
[simple recursion](integration-ts/examples/lazy/transmission.spec.ts), or
315+
[calculous expressions](integration-ts/examples/operations/plus-minus.spec.ts)
287316
).
288317

289318
# Simple documentation of Core bundles
@@ -403,3 +432,8 @@ PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
403432
You should have received a copy of the GNU Lesser General Public License along
404433
with this program; see the file COPYING. If not, write to the Free Software
405434
Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
435+
436+
## Support
437+
438+
Masala Parser is maintained by [Robusta Build](https://www.robusta.build).
439+
Contact us for professional support, consulting, training or custom development.

0 commit comments

Comments
 (0)