Skip to content

High-performance IP and pattern matching for Zeek using memory-mapped databases. 7M+ queries/sec, shared memory across workers, hot-reloadable, no libmaxminddb dependency.

License

Notifications You must be signed in to change notification settings

matchylabs/zeek-matchy-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zeek Matchy Plugin

License Zeek

A Zeek plugin for high-performance IP address and string pattern matching using Matchy databases.


Table of Contents

Why Matchy?

Matchy brings several advantages over traditional threat intelligence approaches in Zeek:

Memory Efficiency on Clusters

  • Shared memory across workers: Databases are memory-mapped, so all Zeek workers on a host share the same physical memory
  • Zero heap memory per-process: Unlike the Intel Framework which loads data into each worker's heap, Matchy uses the OS page cache
  • Massive scale: On a 32-core cluster, this can save gigabytes of RAM compared to per-worker copies

Operational Flexibility

  • Hot-reloadable: Databases open in <1ms, so you can close and reopen them at runtime during updates—no Zeek restart needed
  • No libmaxminddb dependency: Load and query MaxMind GeoIP databases directly—one less C library to manage
  • Build databases offline: Use the matchy CLI in CI/CD pipelines to build databases from any source (CSV, JSON, APIs)
  • Simple distribution: Just copy .mxy files to your cluster—no Broker setup or Intel Framework synchronization

Performance

  • 7M+ IP queries/second: Memory-mapped lookups with zero-copy access
  • 3M+ pattern queries/second: Efficient glob matching (*.evil.com)
  • Deterministic performance: No GC pauses or unpredictable slowdowns (Rust + mmap)
  • Single unified API: Query IPs, CIDRs, exact strings, and wildcards through one interface

Developer Experience

  • Easy debugging: Query .mxy files directly with the matchy CLI—no need to inspect Zeek's internal state
  • Type-safe with metadata: Queries return structured JSON with arbitrary fields, not just boolean matches
  • Version control friendly: Keep source CSVs in git, build binary databases in CI
  • Cross-platform: Same .mxy file works on Linux, macOS, and BSD

Matchy excels at read-heavy workloads with infrequent updates (typical threat intel scenarios). For dynamic, frequently-changing data with complex sharing across clusters, Zeek's Intel Framework is still the better choice.

Installation

Requirements

Build

git clone https://github.com/sethhall/zeek-matchy-plugin.git
cd zeek-matchy-plugin
mkdir build && cd build
cmake ..
make

CMake automatically:

  • Finds Zeek via zeek-config (if in PATH)
  • Installs cargo-c (if needed)
  • Clones and builds Matchy from GitHub
  • Links everything together

Install (optional)

sudo make install

Verify Installation

Check that Zeek can see the plugin:

# If using ZEEK_PLUGIN_PATH
export ZEEK_PLUGIN_PATH=/path/to/zeek-matchy/build
zeek -N Matchy::DB

# If installed system-wide
zeek -N Matchy::DB

Expected output:

Matchy::DB - Fast IP and pattern matching using Matchy databases (dynamic, version 0.1.0)

Functions are automatically available in the Matchy:: namespace:

  • Matchy::load_database(file) - Returns database handle
  • Matchy::is_valid(db) - Check if handle is valid
  • Matchy::query_ip(db, ip) - Query by IP address
  • Matchy::query_string(db, string) - Query by string/pattern

Usage

Creating a Matchy Database

First, install the Matchy CLI tool:

cargo install matchy

Then create a database:

# Create a CSV file with threat indicators
cat > threats.csv << EOF
entry,threat_level,category,description
1.2.3.4,high,malware,Known C2 server
10.0.0.0/8,low,internal,RFC1918 private network
*.evil.com,critical,phishing,Phishing domain pattern
malware.example.com,high,malware,Malware distribution site
EOF

# Build the database
matchy build threats.csv -o threats.mxy --format csv

MatchyIntel Framework (Intel Framework Replacement)

The plugin includes MatchyIntel, a drop-in replacement for Zeek's Intel Framework that uses Matchy for high-performance matching. It automatically observes DNS queries, connection IPs, HTTP URLs, SSL/TLS SNI, and more.

Quick Start

@load Matchy/DB/intel

# Point to your threat intelligence database
redef MatchyIntel::db_path = "/opt/threat-intel/threats.mxy";

# React to matches
event MatchyIntel::match(s: MatchyIntel::Seen, metadata: string) {
    print fmt("THREAT: %s (%s) -> %s", s$indicator, s$where, metadata);
}

That's it! The framework will automatically check all DNS queries, connection IPs, HTTP hosts/URLs, and SSL SNI against your database.

Runtime Database Switching

You can change the database at runtime without restarting Zeek:

# Switch to a different database
Config::set_value("MatchyIntel::db_path", "/opt/threat-intel/updated.mxy");

# Unload the database (stop matching)
Config::set_value("MatchyIntel::db_path", "");

If the new path is invalid, the change is rejected and the current database stays loaded.

Manual Observation

You can also manually check indicators:

# Check an IP
MatchyIntel::seen(MatchyIntel::Seen($host=1.2.3.4,
                                    $where=MatchyIntel::IN_ANYWHERE));

# Check a domain
MatchyIntel::seen(MatchyIntel::Seen($indicator="evil.example.com",
                                    $indicator_type=MatchyIntel::DOMAIN,
                                    $where=MatchyIntel::IN_ANYWHERE));

Hooks and Customization

# Filter matches before they fire
hook MatchyIntel::seen_policy(s: MatchyIntel::Seen, found: bool) {
    # Suppress matches for internal IPs
    if (s?$host && Site::is_local_addr(s$host))
        break;
}

# Customize logging
hook MatchyIntel::extend_match(info: MatchyIntel::Info, s: MatchyIntel::Seen, metadata: string) {
    # Add custom fields, modify info record, etc.
}

Log Output

Matches are logged to matchy_intel.log with fields including:

  • ts, uid, id - Connection context
  • seen.indicator, seen.indicator_type, seen.where - What was seen
  • metadata - JSON blob from your database

Low-Level API

For more control, use the raw BiF functions directly:

global threats_db: opaque of MatchyDB;

event zeek_init() {
    # Load the database - returns an opaque handle
    threats_db = Matchy::load_database("/path/to/threats.mxy");
    
    if (!Matchy::is_valid(threats_db)) {
        print "Failed to load database!";
        return;
    }
    
    print "Database loaded successfully";
}

event connection_new(c: connection) {
    # Query the originator IP using the database handle
    local result = Matchy::query_ip(threats_db, c$id$orig_h);
    
    if (result != "") {
        print fmt("Threat detected from %s: %s", c$id$orig_h, result);
        # Result is JSON - parse with from_json() 
    }
}

event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count) {
    # Query domain name
    local result = Matchy::query_string(threats_db, query);
    
    if (result != "") {
        print fmt("Malicious domain queried: %s - %s", query, result);
    }
}

# Database is automatically cleaned up when Zeek terminates

Advanced Example with JSON Parsing (Low-Level API)

@load base/frameworks/notice

module ThreatIntel;

export {
    redef enum Notice::Type += {
        Threat_Detected
    };
    
    # Define structure matching your database fields
    type ThreatData: record {
        category: string &optional;
        threat_level: string &optional;
        description: string &optional;
    };
    
    global threats_db: opaque of MatchyDB;
}

event zeek_init() {
    threats_db = Matchy::load_database("/opt/threat-intel/threats.mxy");
    
    if (!Matchy::is_valid(threats_db)) {
        print "ERROR: Failed to load threat database";
    }
}

event connection_new(c: connection) {
    local result = Matchy::query_ip(threats_db, c$id$orig_h);
    
    if (result != "") {
        # Parse JSON result into typed record
        local parsed = from_json(result, ThreatData);
        
        if (parsed$valid) {
            local threat: ThreatData = parsed$v;
            
            NOTICE([$note=Threat_Detected,
                    $conn=c,
                    $msg=fmt("Threat: %s (%s)", threat$category, threat$threat_level),
                    $sub=fmt("IP: %s", c$id$orig_h)]);
        }
    }
}

API Reference

load_database(filename: string): opaque of MatchyDB

Load a Matchy database from file and return an opaque handle.

  • filename: Path to the .mxy database file
  • Returns: Opaque database handle, or nullptr on failure

Note: The database is automatically closed when the handle goes out of scope or Zeek terminates. No manual cleanup needed.

is_valid(db: opaque of MatchyDB): bool

Check if a database handle is valid and the database is open.

  • db: Database handle from load_database()
  • Returns: T if valid and open, F otherwise

query_ip(db: opaque of MatchyDB, ip: addr): string

Query the database by IP address.

  • db: Database handle from load_database()
  • ip: IP address to query
  • Returns: JSON string with match data, or empty string if no match

Example: Matchy::query_ip(db, 1.2.3.4)

query_string(db: opaque of MatchyDB, query: string): string

Query the database by string (exact match or pattern).

  • db: Database handle from load_database()
  • query: String to query (domain, exact string, or pattern like *.evil.com)
  • Returns: JSON string with match data, or empty string if no match

Example: Matchy::query_string(db, "malware.example.com")

Testing

The plugin includes comprehensive tests:

cd tests
ZEEK_PLUGIN_PATH=../build zeek simple-test.zeek

All tests should PASS. See tests/README.md for details.

Example test script:

event zeek_init() {
    local db = Matchy::load_database("test.mxy");
    
    if (Matchy::is_valid(db)) {
        # Test IP query
        local ip_result = Matchy::query_ip(db, 1.2.3.4);
        if (ip_result != "") {
            print "Match:", ip_result;
            # Output: {"category":"malware","threat_level":"high",...}
        }
        
        # Test pattern query  
        local pattern_result = Matchy::query_string(db, "sub.evil.com");
        if (pattern_result != "") {
            print "Match:", pattern_result;
            # Output: {"category":"phishing","threat_level":"critical",...}
        }
        
        # Database automatically cleaned up
    }
}

Troubleshooting

Plugin not found at runtime:

export ZEEK_PLUGIN_PATH=/path/to/zeek-matchy-plugin/build
zeek -N Matchy::DB

Advanced build options:

# Use existing Matchy installation
cmake -DBUILD_MATCHY=OFF -DMATCHY_ROOT=/path/to/matchy ..

# Specify Zeek location manually
cmake -DCMAKE_MODULE_PATH=/path/to/zeek/cmake ..

License

BSD-2-Clause License. See LICENSE file.

Contributing

Issues and pull requests welcome at https://github.com/sethhall/zeek-matchy-plugin

See Also

About

High-performance IP and pattern matching for Zeek using memory-mapped databases. 7M+ queries/sec, shared memory across workers, hot-reloadable, no libmaxminddb dependency.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published