Skip to content

Commit 4e6f19e

Browse files
committed
Added doc blocks and updated readme
1 parent 8066075 commit 4e6f19e

19 files changed

+2033
-191
lines changed

.github/workflows/ci.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
run: npm install
4444

4545
- name: Build and test
46-
run: npm run build && npm run test
46+
run: npm run build && npm run coverage
4747

4848
- name: Cache Node.js modules
4949
uses: actions/cache@v2

ENCODING_SPEC.md

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Custom Encoding Specification for FFI Boundary Crossing
2+
## Overview
3+
This document describes the custom encoding format used for serializing the Tag struct in Rust to a `Vec<u8>` for crossing FFI (Foreign Function Interface) boundaries. This encoding format ensures that the data can be efficiently transferred and reconstructed on the other side of the FFI boundary.
4+
5+
## Tag Struct
6+
The Tag struct contains the following fields:
7+
8+
- open_start: `[u32; 2]`
9+
- open_end: `[u32; 2]`
10+
- close_start: `[u32; 2]`
11+
- close_end: `[u32; 2]`
12+
- self_closing: `bool`
13+
- name: `Vec<u8>`
14+
- attributes: `Vec<Attribute>`
15+
- text_nodes: `Vec<Text>`
16+
### Encoding Format
17+
The encoding format is a binary representation of the Tag struct, with the following layout:
18+
19+
1. ### Header (8 bytes):
20+
- attributes_start: u32 (4 bytes) - The starting byte offset of the attributes section.
21+
- text_nodes_start: u32 (4 bytes) - The starting byte offset of the text nodes section.
22+
23+
2. ### Tag Data:
24+
- open_start: `[u32; 2]` (8 bytes)
25+
- open_end: `[u32; 2]` (8 bytes)
26+
- close_start: `[u32; 2]` (8 bytes)
27+
- close_end: `[u32; 2]` (8 bytes)
28+
- self_closing: `u8` (1 byte)
29+
- name_length: `u32` (4 bytes) - The length of the name field.
30+
- name: `Vec<u8>` (variable length) - The UTF-8 encoded bytes of the tag name.
31+
32+
3. ### Attributes Section:
33+
- attributes_count: `u32` (4 bytes) - The number of attributes.
34+
- For each attribute:
35+
- attribute_length: `u32` (4 bytes) - The length of the encoded attribute.
36+
- attribute_data: `Vec<u8>`(variable length) - The encoded attribute data.
37+
38+
39+
Text Nodes Section:
40+
41+
text_nodes_count: u32 (4 bytes) - The number of text nodes.
42+
For each text node:
43+
text_length: u32 (4 bytes) - The length of the encoded text node.
44+
text_data: Vec<u8> (variable length) - The encoded text node data.
45+
Encoding Process
46+
The encoding process involves serializing each field of the Tag struct into a Vec<u8> in the specified order. The following Rust code demonstrates the encoding process:

lib/cjs/saxWasm.d.ts

+245-11
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,142 @@
1-
export declare class SaxEventType {
2-
static Text: number;
3-
static ProcessingInstruction: number;
4-
static SGMLDeclaration: number;
5-
static Doctype: number;
6-
static Comment: number;
7-
static OpenTagStart: number;
8-
static Attribute: number;
9-
static OpenTag: number;
10-
static CloseTag: number;
11-
static Cdata: number;
1+
/**
2+
* An enum representing the events that can be
3+
* subscribed to on the parser. Multiple events
4+
* are subscribed to by using the bitwise or operator.
5+
*
6+
* @example
7+
* ```ts
8+
* // Subscribe to both the Text and OpenTag events.
9+
* const parser = new SaxParser(SaxEventType.Text | SaxEventType.OpenTag);
10+
* ```
11+
* Event subscriptions can be updated between write operations.
12+
*
13+
* Note that minimizing the numnber of events will have a
14+
* slight performance improvement which becomes more noticable
15+
* on very large documents.
16+
*/
17+
export declare enum SaxEventType {
18+
Text = 1,
19+
ProcessingInstruction = 2,
20+
SGMLDeclaration = 4,
21+
Doctype = 8,
22+
Comment = 16,
23+
OpenTagStart = 32,
24+
Attribute = 64,
25+
OpenTag = 128,
26+
CloseTag = 256,
27+
Cdata = 512
1228
}
29+
/**
30+
* Represents the detail of a SAX event.
31+
*/
1332
export type Detail = Position | Attribute | Text | Tag | ProcInst;
33+
/**
34+
* Abstract class for reading SAX event data.
35+
*
36+
* @template T - The type of detail to be read.
37+
*/
1438
export declare abstract class Reader<T = Detail> {
1539
protected data: Uint8Array;
1640
protected cache: {
1741
[prop: string]: T;
1842
};
1943
protected ptr: number;
44+
/**
45+
* Creates a new Reader instance.
46+
*
47+
* @param data - The data to be read.
48+
* @param ptr - The initial pointer position.
49+
*/
2050
constructor(data: Uint8Array, ptr?: number);
51+
/**
52+
* Converts the reader data to a JSON object.
53+
*
54+
* @returns A JSON object representing the reader data.
55+
*/
2156
abstract toJSON(): {
2257
[prop: string]: T;
2358
};
2459
}
60+
/**
61+
* Class representing the line and character
62+
* integers for entities that are encountered
63+
* in the document.
64+
*/
2565
export declare class Position {
2666
line: number;
2767
character: number;
68+
/**
69+
* Creates a new Position instance.
70+
*
71+
* @param line - The line number.
72+
* @param character - The character position.
73+
*/
2874
constructor(line: number, character: number);
2975
}
76+
/**
77+
* Represents the different types of attributes.
78+
*/
3079
export declare enum AttributeType {
3180
Normal = 0,
3281
JSX = 1
3382
}
83+
/**
84+
* Represents an attribute in the XML data.
85+
*/
3486
export declare class Attribute extends Reader<Text | AttributeType> {
3587
type: AttributeType;
3688
name: Text;
3789
value: Text;
90+
/**
91+
* Creates a new Attribute instance.
92+
*
93+
* @param buffer - The buffer containing the attribute data.
94+
* @param ptr - The initial pointer position.
95+
*/
3896
constructor(buffer: Uint8Array, ptr?: number);
97+
/**
98+
* @inheritDoc
99+
*/
39100
toJSON(): {
40101
[prop: string]: Text | AttributeType;
41102
};
103+
/**
104+
* Converts the attribute to a string representation.
105+
*
106+
* @returns A string representing the attribute.
107+
*/
42108
toString(): string;
43109
}
110+
/**
111+
* Represents a processing instruction in the XML data.
112+
*/
44113
export declare class ProcInst extends Reader<Position | Text> {
45114
target: Text;
46115
content: Text;
116+
/**
117+
* Creates a new ProcInst instance.
118+
*
119+
* @param buffer - The buffer containing the processing instruction data.
120+
* @param ptr - The initial pointer position.
121+
*/
47122
constructor(buffer: Uint8Array, ptr?: number);
123+
/**
124+
* Gets the start position of the processing instruction.
125+
*
126+
* @returns The start position of the processing instruction.
127+
*/
48128
get start(): Position;
129+
/**
130+
* Gets the start position of the processing instruction.
131+
*
132+
* @returns The start position of the processing instruction.
133+
*/
49134
get end(): Position;
135+
/**
136+
* Converts the processing instruction to a JSON object.
137+
*
138+
* @returns A JSON object representing the processing instruction.
139+
*/
50140
toJSON(): {
51141
[p: string]: Position | Text;
52142
};
@@ -93,11 +183,155 @@ export declare class SAXParser {
93183
eventHandler?: (type: SaxEventType, detail: Detail) => void;
94184
private writeBuffer?;
95185
constructor(events?: number);
186+
/**
187+
* Parses the XML data from a readable stream.
188+
*
189+
* This function takes a readable stream of `Uint8Array` chunks and processes them using the SAX parser.
190+
* It yields events and their details as they are parsed.
191+
*
192+
* # Arguments
193+
*
194+
* * `reader` - A readable stream reader for `Uint8Array` chunks.
195+
*
196+
* # Returns
197+
*
198+
* * An async generator yielding tuples of `SaxEventType` and `Detail`.
199+
*
200+
* # Examples
201+
*
202+
* ```ts
203+
* // Node.js example
204+
* import { createReadStream } from 'fs';
205+
* import { resolve as pathResolve } from 'path';
206+
* import { Readable } from 'stream';
207+
* import { SAXParser, SaxEventType, Detail } from 'sax-wasm';
208+
*
209+
* (async () => {
210+
* const parser = new SAXParser(SaxEventType.Text | SaxEventType.OpenTag);
211+
* const options = { encoding: 'utf8' };
212+
* const readable = createReadStream(pathResolve('path/to/your.xml'), options);
213+
* const webReadable = Readable.toWeb(readable);
214+
*
215+
* for await (const [event, detail] of parser.parse(webReadable.getReader())) {
216+
* // Do something with these
217+
* }
218+
* })();
219+
*
220+
* // Browser example
221+
* import { SAXParser, SaxEventType, Detail } from 'sax-wasm';
222+
*
223+
* (async () => {
224+
* const parser = new SAXParser(SaxEventType.Text | SaxEventType.OpenTag);
225+
* const response = await fetch('path/to/your.xml');
226+
* const reader = response.body.getReader();
227+
*
228+
* for await (const [event, detail] of parser.parse(reader)) {
229+
* // Do something with these
230+
* }
231+
* })();
232+
* ```
233+
*/
96234
parse(reader: ReadableStreamDefaultReader<Uint8Array>): AsyncGenerator<[SaxEventType, Detail]>;
235+
/**
236+
* Writes a chunk of data to the parser.
237+
*
238+
* This function takes a `Uint8Array` chunk and processes it using the SAX parser.
239+
*
240+
* # Arguments
241+
*
242+
* * `chunk` - A `Uint8Array` chunk representing the data to be parsed.
243+
*
244+
* # Examples
245+
*
246+
* ```ts
247+
* // Node.js example
248+
* import { createReadStream } from 'node:fs';
249+
* import { resolve as pathResolve } from 'node:path';
250+
* import { Readable } from 'stream';
251+
* import { SAXParser, SaxEventType } from 'sax-wasm';
252+
*
253+
* (async () => {
254+
* const parser = new SAXParser(SaxEventType.Text | SaxEventType.OpenTag);
255+
* await parser.prepareWasm(fetch('path/to/your.wasm'));
256+
* const options = { encoding: 'utf8' };
257+
* const readable = createReadStream(pathResolve(__dirname + '/xml.xml'), options);
258+
* const webReadable = Readable.toWeb(readable);
259+
*
260+
* for await (const chunk of webReadable.getReader()) {
261+
* parser.write(chunk);
262+
* }
263+
* parser.end();
264+
* })();
265+
*
266+
* // Browser example
267+
* import { SAXParser, SaxEventType } from 'sax-wasm';
268+
*
269+
* (async () => {
270+
* const parser = new SAXParser(SaxEventType.Text | SaxEventType.OpenTag);
271+
* await parser.prepareWasm(fetch('path/to/your.wasm'));
272+
* const response = await fetch('path/to/your.xml');
273+
* const reader = response.body.getReader();
274+
*
275+
* while (true) {
276+
* const { done, value } = await reader.read();
277+
* if (done) break;
278+
* parser.write(value);
279+
* }
280+
* parser.end();
281+
* })();
282+
* ```
283+
*/
97284
write(chunk: Uint8Array): void;
285+
/**
286+
* Ends the parsing process.
287+
*
288+
* This function signals the end of the parsing process and performs any necessary cleanup.
289+
*/
98290
end(): void;
291+
/**
292+
* Prepares the WebAssembly module for the SAX parser.
293+
*
294+
* This function takes a WebAssembly module source (either a `Response` or `Uint8Array`)
295+
* and instantiates it for use with the SAX parser.
296+
*
297+
* # Arguments
298+
*
299+
* * `source` - A `Response`, `Promise<Response>`, or `Uint8Array` representing the WebAssembly module source.
300+
*
301+
* # Returns
302+
*
303+
* * A `Promise<boolean>` that resolves to `true` if the WebAssembly module was successfully instantiated.
304+
*
305+
* # Examples
306+
*
307+
* ```ts
308+
* // Node.js example
309+
* import { SAXParser, SaxEventType } from 'sax-wasm';
310+
* import { readFileSync } from 'fs';
311+
* import { resolve as pathResolve } from 'path';
312+
*
313+
* (async () => {
314+
* const parser = new SAXParser(SaxEventType.Text | SaxEventType.OpenTag);
315+
* const wasmBuffer = readFileSync(pathResolve(__dirname + '/sax-wasm.wasm'));
316+
* const success = await parser.prepareWasm(wasmBuffer);
317+
* console.log('WASM prepared:', success);
318+
* })();
319+
*
320+
* // Browser example
321+
* import { SAXParser, SaxEventType } from 'sax-wasm';
322+
*
323+
* (async () => {
324+
* const parser = new SAXParser(SaxEventType.Text | SaxEventType.OpenTag);
325+
* const success = await parser.prepareWasm(fetch('path/to/your.wasm'));
326+
* console.log('WASM prepared:', success);
327+
* })();
328+
* ```
329+
*/
99330
prepareWasm(source: Response | Promise<Response>): Promise<boolean>;
100331
prepareWasm(saxWasm: Uint8Array): Promise<boolean>;
101332
eventTrap: (event: number, ptr: number, len: number) => void;
102333
}
334+
export declare const readString: (data: Uint8Array, offset: number, length: number) => string;
335+
export declare const readU32: (uint8Array: Uint8Array, ptr: number) => number;
336+
export declare const readPosition: (uint8Array: Uint8Array, ptr?: number) => Position;
103337
export {};

0 commit comments

Comments
 (0)