Computes how different your current ranking item is compared to other items within the same ranking. Numeric and string fields are supported.
Consider that all items in your inventory have a numeric price field:
{
"event": "item",
"id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"item": "item1",
"timestamp": "1599391467000",
"fields": [{"name": "price", "value": 69.0}]
}Then for a ranking below:
{
"event": "ranking",
"id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"timestamp": "1599391467000",
"user": "user1",
"session": "session1",
"items": [
{"id": "item1"},
{"id": "item2"},
{"id": "item3"}
]
}we can compute how different each item price compared to the median price across the whole ranking with the following configuration snippet:
- name: price_diff
type: diversity
source: item.price # only item.* fields are accepted
ttl: 90d # optional, when to expire tracked fields
top: 10 # optional, take only top-N items to compute the medianFor example, given the following item prices:
- p1: price=100
- p2: price=200
- p3: price=250
- p4: price=300
- p5: price=220
So for a ranking [p1, p2, p3, p4, p5] we compute a median value of 220, and then compute the difference:
- p1: price_diff=-120
- p2: price_diff=-20
- p3: price_diff=30
- p4: price_diff=80
- p5: price_diff=0
When you have a very long ranking, it's worth to consider limiting the amount of items taken into account, when computing median. When setting top=3, for the same set of items in the ranking event above, you'll get the median of 200:
- p1: price_diff=-100
- p2: price_diff=0
- p3: price_diff=50
- p4: price_diff=100
- p5: price_diff=20
This type of diversification can be useful to see how different your items over low-cardinality fields like tags, colors, sizes and categories. Both string and string[] field types are supported.
When all your inventory items have a field color like in an example below:
{
"event": "item",
"id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"item": "item1",
"timestamp": "1599391467000",
"fields": [{"name": "color", "value": "red"}]
}Then for a ranking below:
{
"event": "ranking",
"id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
"timestamp": "1599391467000",
"user": "user1",
"session": "session1",
"items": [
{"id": "item1"},
{"id": "item2"},
{"id": "item3"}
]
}we can compute how frequently each color is presented in the result set with the following configuration snippet:
- name: color_diff
type: diversity
source: item.color # only item.* fields are accepted
ttl: 90d # optional, when to expire tracked fields
top: 10 # optional, take only top-N items to compute the histogramThe difference algorithm builds tag frequencies over the ranking (so color -> count in our example above), and then computes relative intersection between tags of item and tag frequencies.
An example:
- given a frequency of {red: 50%, green: 30%, blue: 20%}
- for an item having only red color, the score will be 50%.
- for a red-blue item, the score will be 50%+20%=70%