Skip to content

pgaskin/marisa.js

Repository files navigation

marisa-trie-wasm

NPM Version Test Attest marisa build

Modern JavaScript browser/node bindings for marisa-trie.

MARISA is a read-only space-efficient trie data structure optimized for lookup, reverse lookup, commmon prefix search (keys which are prefixes of the query), and predictive search (keys starting with the query).

This library wraps the WebAssembly build of marisa-trie from go-marisa.

Getting started

npm install --save marisa-trie-wasm
<script type="module">
    import { Trie } from 'https://cdn.jsdelivr.net/npm/marisa-trie-wasm@0.0.5/dist/index.min.js'
</script>
import { Trie } from 'marisa-trie-wasm'

function *letters() {
    for (let a = 0; a < 26; a++) {
        for (let b = 0; b < 26; b++) {
            for (let c = 0; c < 26; c++) {
                yield String.fromCharCode(...[a, b, c].map(x => x + 97))
            }
        }
    }
}

const trie = await Trie.build(letters())

let id = trie.lookup('aaa')
if (id == null) {
    console.log('not found')
}

let key = trie.reverseLookup(id)
if (key == null) {
    console.log('not found')
}

console.log('lookup', id, key)

for (const [id, key] of trie.predictiveSearch('cb')) {
    console.log('predictiveSearch', id, key)
}

for (const [id, key] of trie.commonPrefixSearch('abcdefg')) {
    console.log('commonPrefixSearch', id, key)
}

console.log(trie.save())
console.log((await Trie.load(trie.save())).save())

Limitations

This library supports little-endian MARISA dictionaries up to 4 GiB. On 32-bit systems, the size is limited to around 2 GiB. These are limitations of MARISA itself.

Big-endian dictionaries (i.e., ones generated with the native tools on big-endian hosts) are not supported.

Memory-mapped dictionaries are not supported.

Asynchronous streaming I/O isn't supported yet.

Performance

This depends on the runtime, but with JIT, it should be 1.5-3x slower than the native library. With an interpreter, it should be 50-150x slower.

The memory usage should be around the same other than a ~115K overhead per trie. Keys are copied to/from JavaScript strings when used.

Design

The API is stable, typed, and does not leak implementation details of the marisa-trie library. All C++ exceptions are converted to JS exceptions.

The query interface uses generators for ergonomics and efficiency. It is okay to nest them. If you're calling them manually (rather than in a for loop or another API), remember to call return() so it can free or reuse the associated memory.

Tries are garbage-collected along with the associated JS object.

Testing

The wasm blob built and verified as part of go-marisa, and the Go module hash used to fetch it is pinned.

There are comprehensive tests of the wasm blob in go-marisa, and some minimal API tests here.

About

Modern JavaScript browser/node bindings for marisa-trie.

Topics

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE.marisa

Stars

Watchers

Forks

Contributors