Skip to content

Commit 33063e2

Browse files
committed
doc:imporeve readme
1 parent 2bdbde5 commit 33063e2

3 files changed

Lines changed: 155 additions & 30 deletions

File tree

README.md

Lines changed: 153 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -19,35 +19,160 @@ A lexical analysis / regular expression engine written in TypeScript
1919

2020
### Quick starting example
2121

22-
```ts
23-
import { stateOps, epsilon, DFA, NFA, concatMultipleStates, unionMultipleStates } from 'lxa';
24-
const { SingleInputState } = stateOps;
25-
26-
// .jpe?g
27-
test('.jpe?g work', () => {
28-
const final = concatMultipleStates(
29-
new SingleInputState('.'),
30-
new SingleInputState('j'),
31-
new SingleInputState('p'),
32-
unionMultipleStates({states: [
33-
new SingleInputState('e'),
22+
Let's get started by generating a regular expression checker, testing whether a string is of the language of `/(a|b)*cd?/` using *lxa*.
23+
24+
> Tips: You will see there are concepts of *NFAs* and *DFAs* in the example code. Don't be worried about that since using *lxa* does not require the prerequisite knowledge of [NFAs (Non-deterministic Finite Automata)](https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton) and [DFAs (Deterministic Finite Automata)](https://en.wikipedia.org/wiki/Deterministic_finite_automaton). It's not hard for you to build your own lexical analyzer or regular expression tools following this guide. Understanding those concepts helps you acquire a deeper understanding of the *lxa*'s principle though.
25+
26+
The expression of `(a|b)*cd?` consists of three parts, which also consist of smaller units, and so on. The following describes all the parts of the entire expression.
27+
28+
The entire expression is the concatenation of the following three expressions
29+
30+
- `(a|b)*`
31+
32+
- which is the closure of `(a|b)`
33+
34+
- which is the union of `a` and `b`
35+
36+
- A single character of `c`
37+
38+
- `d?`
39+
40+
- The concatenation of single character `d` and empty string (We mark empty string as `ε` (epsilon)
41+
42+
1. First, we need to create *states* for each part of the expression and combine them together.
43+
44+
```ts
45+
import { stateOps, epsilon } from 'lxa';
46+
const { SingleInputState, UnionState, ClosureState } = stateOps;
47+
48+
// state for single character 'a' and 'b'
49+
const state_for_a = new SingleInputState('a');
50+
const state_for_b = new SingleInputState('b');
51+
52+
// and generate the union of 'a' and 'b', (a|b)
53+
const union_of_a_and_b = new UnionState(a, b);
54+
55+
// and then the closure `(a|b)*`
56+
const union_of_a_and_b_closure = new ClosureState(union_of_a_and_b);
57+
58+
// and concatenate `(a|b)*` with c
59+
const concat_with_c = new ConcatState(union_of_a_and_b_closure, new SingleInputState('c'));
60+
61+
// Before we generate the final expression,
62+
// we generate the union of 'd' and empty string,
63+
// representing `d?` or `d|ε`
64+
const d_or_empty = new UnionState(
65+
new SingleInputState('d'),
3466
new SingleInputState(epsilon),
35-
]}),
36-
new SingleInputState('g', true)
37-
);
38-
const dfa: DFA = new NFA(final).toDFA();
39-
40-
expect(dfa.test('.jpg')).toBe(true);
41-
expect(dfa.test('.jpeg')).toBe(true);
42-
expect(dfa.test('')).toBe(false);
43-
expect(dfa.test('jpg')).toBe(false);
44-
expect(dfa.test('jpeg')).toBe(false);
45-
expect(dfa.test('jp')).toBe(false);
46-
expect(dfa.test('jpgg')).toBe(false);
47-
expect(dfa.test('png')).toBe(false);
48-
})
67+
// `true` means this is the final accepted state.
68+
// Refer to API doc for more detail.
69+
true,
70+
);
71+
72+
// Finally, we concatenate them all
73+
const final = new ConcatState(concat_with_c, d_or_empty);
74+
```
75+
2. Generate a DFA for testing.
76+
77+
```ts
78+
import { NFA } from 'lxa';
79+
const dfa = new NFA(final).toDFA();
80+
81+
dfa.test('aaac'); // true
82+
dfa.test('abcd') // true
83+
dfa.test('bbbcd') // true
84+
dfa.test('ad') // false
85+
```
86+
It is verbose to union or concatenate multiple states because we need to nest those states in a very deep hierarchy, especially when the expression is complicated. We have provided you with two util functions [`concatMultipleStates()`](#concatmultiplestates), [`unionMultipleStates()`](#unionmultiplestates) to union or concatenate multiple states such that we don't have to nest them all.
87+
88+
```ts
89+
import { concatMultipleStates } from 'lxa';
90+
91+
// This is much concise
92+
const final = concatMultipleStates(
93+
union_of_a_and_b_closure,
94+
new SingleInputState('c'),
95+
d_or_empty
96+
);
4997
```
5098

51-
### APIs
99+
## APIs
100+
101+
### `epsilon`
102+
103+
`epsilon` is a singleton object representing an empty string. It can be used as the argument for `input` of the `StateOp`'s constructor.
104+
105+
### `stateOps`
106+
107+
#### `stateOps.StateOp`
108+
109+
This is the base class. Please do not instantiate it explicitly. You can use it as a type notation for TypeScript. The following classes are subclasses of `StateOp`.
110+
111+
#### `stateOps.SingleInputState `
112+
113+
`constructor SingleInputState(input: InputType, accepted?: boolean): SingleInputState`
114+
115+
- `inputType` is either a `string` type or the [`epsilon`](#epsilon) object
116+
- `accepted` indicates whether the current state is accepted or not. If the current state is accepted and there is no more input string, the whole regular expression is accepted. Refer to the the explanation for *NFAs* and *DFAs* for more details about the *accepted* state. Default to `false`.
117+
118+
#### `stateOps.ConcatState`
119+
120+
`constructor ConcatState(a: StateOp, b: StateOp): ConcatState`
121+
122+
Concatenates two states. Use [`concatMultipleStates()`](#concatmultiplestates) for a shorthand of concatenating more states.
123+
124+
#### `stateOps.UnionState`
125+
126+
`constructor UnionState(a: StateOp, b: StateOp, accepted?: boolean): UnionState`
127+
128+
Unions two states. Use Use [`unionMultipleStates()`](#unionmultiplestates) for a shorthand of uniting more states.
129+
130+
- `accepted`, ditto
131+
132+
#### `stateOps.ClosureState`
133+
134+
`constructor ClosureState(a: StateOp, accepted?: boolean): ClosureState`
135+
136+
Generates the closure of a state.
137+
138+
- `a` is the input state to use to generate the closure
139+
- `accepted`, ditto
140+
141+
### `concatMultipleStates`
142+
143+
`function concatMultipleStates(...states: StateOp[]): StateOp`
144+
145+
Concatenates multiple states together. Shorthand for nesting constructors of `stateOps.ConcatState`
146+
147+
### `unionMultipleStates`
148+
149+
`function unionMultipleStates({states, accepted}): StateOp`
150+
151+
Unites multiple states together. Shorthand for nesting constructors of `stateOps.UnionState`
152+
153+
- `states` is an array of `StateOp` instances
154+
- `accepted`, ditto
155+
156+
### `NFA`
157+
158+
#### `NFA constructor`
159+
160+
`constructor NFA(state: StateOp): NFA`
161+
162+
#### `NFA.prototype.toDFA`
163+
164+
`NFA.prototype.toDFA.toDFA(): DFA`
165+
166+
Returns a `DFA` instance generating from the `NFA` instance caller
167+
168+
### `DFA`
169+
170+
#### `DFA.prototype.test`
171+
172+
`DFA.prototype.test(input: string): boolean`
173+
174+
Checks if the input string is of the expression language
175+
176+
## License
52177

53-
API documentation is under working. Feel free to check out the source code.
178+
Under the [MIT License](https://github.com/yuqingc/lxa/blob/master/LICENSE).

src/fas/stateOps.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { State, InputType} from './state';
1+
import { State, InputType } from './state';
22
import epsilon from './epsilon';
33

44
export class StateOp {

test/dfa.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ test('https? should work', () => {
5959
})
6060

6161
// .jpe?g
62-
test('.jpe?g work', () => {
62+
test('.jpe?g should work', () => {
6363
const final = concatMultipleStates(
6464
new SingleInputState('.'),
6565
new SingleInputState('j'),

0 commit comments

Comments
 (0)