Skip to content
@daac-tools

daac-tools

Pinned Loading

  1. daachorse daachorse Public

    🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.

    Rust 210 14

  2. vaporetto vaporetto Public

    🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

    Rust 235 10

  3. crawdad crawdad Public

    🦞 Rust library of natural language dictionaries using character-wise double-array tries.

    Rust 30 2

  4. vibrato vibrato Public

    🎤 vibrato: Viterbi-based accelerated tokenizer

    Rust 352 15

  5. rucrf rucrf Public

    Conditional Random Fields implemented in pure Rust

    Rust 8 3

  6. trie-match trie-match Public

    Fast match expression optimized for string comparison

    Rust 38

Repositories

Showing 10 of 13 repositories
  • python-daachorse Public

    🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

    daac-tools/python-daachorse’s past year of commit activity
    Rust 16 Apache-2.0 1 0 1 Updated Mar 13, 2025
  • python-vaporetto Public

    🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

    daac-tools/python-vaporetto’s past year of commit activity
    Rust 20 Apache-2.0 1 0 0 Updated Mar 13, 2025
  • find-simdoc Public

    Finding all pairs of similar documents time- and memory-efficiently

    daac-tools/find-simdoc’s past year of commit activity
    Rust 60 Apache-2.0 3 1 0 Updated Mar 13, 2025
  • vaporetto Public

    🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

    daac-tools/vaporetto’s past year of commit activity
    Rust 235 Apache-2.0 10 0 1 Updated Mar 10, 2025
  • vibrato Public

    🎤 vibrato: Viterbi-based accelerated tokenizer

    daac-tools/vibrato’s past year of commit activity
    Rust 352 Apache-2.0 15 6 0 Updated Feb 21, 2025
  • crawdad Public

    🦞 Rust library of natural language dictionaries using character-wise double-array tries.

    daac-tools/crawdad’s past year of commit activity
    Rust 30 Apache-2.0 2 0 0 Updated Jan 13, 2025
  • daachorse Public

    🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.

    daac-tools/daachorse’s past year of commit activity
    Rust 210 Apache-2.0 14 2 2 Updated Dec 29, 2024
  • python-vibrato Public

    Viterbi-based accelerated tokenizer (Python wrapper)

    daac-tools/python-vibrato’s past year of commit activity
    Rust 41 Apache-2.0 1 0 0 Updated Sep 4, 2024
  • trie-match Public

    Fast match expression optimized for string comparison

    daac-tools/trie-match’s past year of commit activity
    Rust 38 Apache-2.0 0 0 0 Updated Jan 29, 2024
  • vaporetto-models Public

    Tokenization models and training scripts for Vaporetto fast tokenizer

    daac-tools/vaporetto-models’s past year of commit activity
    Rust 1 Apache-2.0 0 0 0 Updated May 30, 2023

Top languages

Loading…

Most used topics

Loading…