The dedup-bloom-filter is a smartmodule that filters out
duplicate record keys in a specified window.
Guide on how to use this smartmodule.
Example topic configuration:
# topic.yaml
version: 0.1.0
meta:
name: topic-with-dedup
deduplication:
bounds:
count: 5 # remember at least 5 last records
age: 5s # remember records for at least 5 seconds
filter:
transform:
uses: fluvio/[email protected]Create a topic with this config:
fluvio topic create -c topic.yaml| Parameter | default | type | optional | description |
|---|---|---|---|---|
| count | - | Integer | false | Minimum number of records the filter will remember. It doesn't guarantee to remember records that came count records before now. |
| age | - | Integer | true | Minimum amount of time this filter will remember a record for. It can be specified using this format: 15days 2min 2s, or 2min 5s, or 15ms |
Configuration is similar to using it on topic. But it is under topic/deduplication field.
# connector-config.yaml
meta:
version: 0.2.3
name: cat-facts
type: http-source
topic:
meta:
name: cat-facts
deduplication:
bounds: ...
filter: ...
http:
endpoint: "https://catfact.ninja/fact"
interval: 10s | Parameter | default | type | optional | description |
|---|---|---|---|---|
| num_frames | 8 | Integer | true | Number of internal frames the filter will use. It can be changed to tune memory/execution_time tradeoff. More info can be found in the docs. |