Skip to content

vuldin/trie-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trie-rs

A Rust implementation of a trie data structure with a focus on matching phrases using word stems.

This is the Rust port of the JavaScript @vuldin/trie package, providing the same phrase-matching functionality with Rust's performance and safety guarantees.

Features

  • Phrase-based matching: Nodes represent word stems rather than individual characters
  • Intelligent text processing: Filters common words and uses stemming for fuzzy matching
  • Flexible API: Support for method chaining and multiple phrase formats
  • Zero dependencies: All text processing implemented with standard library only
  • Performance: Built with Rust for speed and memory efficiency

Usage

Add to your Cargo.toml:

[dependencies]
trie-rs = "0.1.0"

Basic Example

use trie_rs::Trie;

fn main() {
    let mut trie = Trie::new();
    
    // Add phrases to the trie
    trie.add("Hello world!")
        .add("This is a test sentence that contains some common words.")
        .add("Method calls can be chained");
    
    // Search for phrases
    if let Some(result) = trie.find("test sentence") {
        println!("Found {} matches, exact: {}", result.count, result.exact);
        // Output: Found 2 matches, exact: true
    }
    
    // Common words are automatically filtered
    if let Some(result) = trie.find("test a sentence") {
        println!("Found {} matches, exact: {}", result.count, result.exact);
        // Output: Found 2 matches, exact: true (same as above)
    }
}

How It Works

Text Processing Pipeline

  1. Phrase Extraction: Input text is split into phrases using punctuation
  2. Word Filtering: Common English words are removed from each phrase
  3. Stemming: Remaining words are converted to their stem forms (e.g., "running" → "run")
  4. Trie Construction: Stems are inserted into the trie structure in reverse order

Example Transformation

The text:

"Here is a test sentence that contains some common words in English."

Becomes the stems:

["english", "word", "common", "contain", "sentenc", "test", "here"]

Data Structure

Unlike traditional tries that store individual characters, this implementation:

  • Stores word stems at each node
  • Tracks which phrases contain each stem
  • Supports partial phrase matching
  • Enables fuzzy matching through stemming

API Reference

Trie::new()

Creates a new empty trie.

trie.add(text: &str) -> &mut Self

Adds text to the trie. Supports method chaining.

trie.find(phrase: &str) -> Option<FindResult>

Searches for a phrase in the trie. Returns None if no matches found.

FindResult

pub struct FindResult {
    pub count: usize,  // Number of matching stems
    pub exact: bool,   // Whether all stems in the phrase were found
}

Testing

Run the test suite:

cargo test

Performance

The Rust implementation provides significant performance improvements over the JavaScript version:

  • 2.8-3.5x faster execution for insertion and search operations
  • Zero dependencies for smaller bundle size (137KB vs 193KB with dependencies)
  • Zero-copy string operations where possible
  • Efficient HashMap-based lookups
  • Memory-safe operations without garbage collection overhead
  • Custom text processing algorithms optimized for performance

License

MIT License - see the LICENSE file for details.

About

Rank text by best match to seeded words and phrases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages