Skip to content

infinyon/dedup-bloom-filter

Fluvio Deduplication SmartModule

The dedup-bloom-filter is a smartmodule that filters out duplicate record keys in a specified window.

Usage

Guide on how to use this smartmodule.

Using on topic

Example topic configuration:

# topic.yaml
version: 0.1.0
meta:
  name: topic-with-dedup
deduplication:
  bounds:
    count: 5 # remember at least 5 last records
    age: 5s # remember records for at least 5 seconds
  filter:
    transform:
      uses: fluvio/[email protected]

Create a topic with this config:

fluvio topic create -c topic.yaml

Deduplication bounds

Parameter default type optional description
count - Integer false Minimum number of records the filter will remember. It doesn't guarantee to remember records that came count records before now.
age - Integer true Minimum amount of time this filter will remember a record for. It can be specified using this format: 15days 2min 2s, or 2min 5s, or 15ms

Using on connector

Configuration is similar to using it on topic. But it is under topic/deduplication field.

# connector-config.yaml
meta:
  version: 0.2.3
  name: cat-facts
  type: http-source
  topic: 
    meta:
      name: cat-facts
    deduplication:
      bounds: ...
      filter: ...
http:
  endpoint: "https://catfact.ninja/fact"
  interval: 10s  

Parameters

Configuration

Parameter default type optional description
num_frames 8 Integer true Number of internal frames the filter will use. It can be changed to tune memory/execution_time tradeoff. More info can be found in the docs.

About

Deduplication filter smartmodule

Resources

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE-APACHE

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published