Skip to content

Commit dec8883

Browse files
committed
Added another dataset
Signed-off-by: Owais <[email protected]>
1 parent 5e7ed63 commit dec8883

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

_posts/2025-03-31-zscore-hybrid-search.md

+9-7
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ POST my_index/_search?search_pipeline=z_score-pipeline
8383

8484
## Benchmarking Z Score performance
8585

86-
Benchmark experiments were conducted using an OpenSearch cluster consisting of a single r6g.8xlarge instance as the coordinator node, along with three r6g.8xlarge instances as data nodes. To assess Z Score’s performance comprehensively, we measured two key metrics across four distinct datasets. For information about the datasets used, see [Datasets](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/).
86+
Benchmark experiments were conducted using an OpenSearch cluster consisting of a single r6g.8xlarge instance as the coordinator node, along with three r6g.8xlarge instances as data nodes. To assess Z Score’s performance comprehensively, we measured two key metrics across five distinct datasets. For information about the datasets used, see [Datasets](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/).
8787

8888
### Sample queries and passages
8989

@@ -95,6 +95,7 @@ The following table provides sample queries and passages for each dataset.
9595
|FiQA |“Business day” and “due date” for bills |`I don't believe Saturday is a business day either. When I deposit a check at a bank's drive-in after 4pm Friday, the receipt tells me it will credit as if I deposited on Monday. If a business' computer doesn't adjust their billing to have a weekday due date ... `|
9696
|nq |what is non controlling interest on balance sheet |`In accounting, minority interest (or non-controlling interest) is the portion of a subsidiary corporation's stock that is not owned by the parent corporation. The magnitude of the minority interest in the subsidiary company is generally less than 50% of outstanding shares, or the corporation would generally cease to be a subsidiary of the parent`|
9797
|ArguAna |Poaching is becoming more advanced A stronger, militarised approach is needed as poaching is becoming ... |`Tougher protection of Africa\u2019s nature reserves will only result in more bloodshed. Every time the military upgrade their weaponry, tactics and logistic, the poachers improve their own methods to counter ...` |
98+
|touche2020 |Is a college education worth it? |`The resolution used by Pro *assumes* that Australia isn't already a 'significant' country - however, in actual reality, it is. Firstly we should clarify what significance means: 1.a the state or quality of being significant1.b of consequence or..` |
9899

99100

100101
Search relevance was quantified using the industry-standard Normalized Discounted Cumulative Gain at rank 10 (NDCG@10). We also tracked system performance using search latency measurements. This setup provided a strong foundation for evaluating both search quality and operational efficiency.
@@ -108,14 +109,15 @@ Search relevance was quantified using the industry-standard Normalized Discounte
108109
|fiqa |0.2747 |0.2768 |+0.77% |
109110
|nq |0.3665 |0.374 |+2.05% |
110111
|arguana |0.4507 |0.467 |+3.62% |
111-
| | |Average |2.22% |
112+
|touche2020 |0.841 |0.8542 |+1.54% |
113+
| | |Average |2.08% |
112114

113115
### Search latency
114116

115117

116118
The following table presents search latency measurements in milliseconds at different percentiles (p50, p90, and p99) for both the Hybrid with min max and z score approaches. The *Percent difference* columns show the relative performance impact between these methods.
117119

118-
<table> <tr> <th></th> <th colspan="3"><b>p50</b></th> <th colspan="3"><b>p90</b></th> <th colspan="3"><b>p99</b></th> </tr> <tr> <td></td> <td><b>Hybrid (min max)</b></td> <td><b>Hybrid (z score)</b></td> <td><b>Percent difference</b></td> <td><b>Hybrid (min max)</b></td> <td><b>Hybrid (z score)</b></td> <td><b>Percent difference</b></td> <td><b>Hybrid (min max)</b></td> <td><b>Hybrid (z score)</b></td> <td><b>Percent difference</b></td> </tr> <tr> <td>scidocs</td> <td>76.25</td> <td>77.5</td> <td>1.64%</td> <td>99</td> <td>100.5</td> <td>1.52%</td> <td>129.54</td> <td>133.04</td> <td>2.70%</td> </tr> <tr> <td>fiqa</td> <td>80</td> <td>81</td> <td>1.25%</td> <td>104.5</td> <td>105</td> <td>0.48%</td> <td>123.236</td> <td>124</td> <td>0.62%</td> </tr> <tr> <td>nq</td> <td>117</td> <td>117</td> <td>0%</td> <td>140</td> <td>140</td> <td>0%</td> <td>166.74</td> <td>165.24</td> <td>-0.90%</td> </tr> <tr> <td>arguana</td> <td>349</td> <td>349</td> <td>0%</td> <td>382</td> <td>382</td> <td>0%</td> <td>417.975</td> <td>418.475</td> <td>0.12%</td> </tr> <tr> <td></td> <td></td> <td><b>Average:</b></td> <td>0.72%</td> <td></td> <td><b>Average:</b></td> <td>0.50%</td> <td></td> <td><b>Average:</b></td> <td>0.64%</td> </tr> </table>
120+
<table> <tr> <th></th> <th colspan="3"><b>p50</b></th> <th colspan="3"><b>p90</b></th> <th colspan="3"><b>p99</b></th> </tr> <tr> <td></td> <td><b>Hybrid (min max)</b></td> <td><b>Hybrid (z score)</b></td> <td><b>Percent difference</b></td> <td><b>Hybrid (min max)</b></td> <td><b>Hybrid (z score)</b></td> <td><b>Percent difference</b></td> <td><b>Hybrid (min max)</b></td> <td><b>Hybrid (z score)</b></td> <td><b>Percent difference</b></td> </tr> <tr> <td>scidocs</td> <td>76.25</td> <td>77.5</td> <td>1.64%</td> <td>99</td> <td>100.5</td> <td>1.52%</td> <td>129.54</td> <td>133.04</td> <td>2.70%</td> </tr> <tr> <td>fiqa</td> <td>80</td> <td>81</td> <td>1.25%</td> <td>104.5</td> <td>105</td> <td>0.48%</td> <td>123.236</td> <td>124</td> <td>0.62%</td> </tr> <tr> <td>nq</td> <td>117</td> <td>117</td> <td>0%</td> <td>140</td> <td>140</td> <td>0%</td> <td>166.74</td> <td>165.24</td> <td>-0.90%</td> </tr> <tr> <td>arguana</td> <td>349</td> <td>349</td> <td>0%</td> <td>382</td> <td>382</td> <td>0%</td> <td>417.975</td> <td>418.475</td> <td>0.12%</td> </tr> <tr> <td>touche2020</td> <td>77</td> <td>77.5</td> <td>0.64%</td> <td>100</td> <td>100.5</td> <td>0.50%</td> <td>140</td> <td>140</td> <td>0%</td> </tr> <tr> <td></td> <td></td> <td><b>Average:</b></td> <td>0.70%</td> <td></td> <td><b>Average:</b></td> <td>0.50%</td> <td></td> <td><b>Average:</b></td> <td>0.50%</td> </tr> </table>
119121

120122
### Conclusions
121123

@@ -124,7 +126,7 @@ Our benchmark experiments highlight the following advantages and trade-offs of Z
124126

125127
**Search quality (measured using NDCG@10 across four datasets)**:
126128

127-
* Z-score normalization shows a modest improvement in search quality, with an average increase of 2.2% in NDCG@10 scores.
129+
* Z-score normalization shows a modest improvement in search quality, with an average increase of 2.08% in NDCG@10 scores.
128130
* This suggests that Z-score normalization may provide slightly better relevance in search results compared to the default normalization technique min-max.
129131

130132

@@ -134,15 +136,15 @@ Our benchmark experiments highlight the following advantages and trade-offs of Z
134136

135137
|Latency percentile |Percent difference |
136138
|--- |--- |
137-
|p50 |0.72% |
139+
|p50 |0.70% |
138140
|p90 |0.50% |
139-
|p99 |0.64% |
141+
|p99 |0.50% |
140142

141143
* The positive percentages indicate that Z-score normalization has slightly higher latency compared to min-max normalization, but the differences are minimal (less than 1% on average).
142144

143145
**Trade-offs**:
144146

145-
* There's a slight trade-off between search quality and latency. Z-score normalization offers a improvement in search relevance (2.2% increase in NDCG@10) at the cost of a marginal increase in latency (0.50% to 0.72% across different percentiles).
147+
* There's a slight trade-off between search quality and latency. Z-score normalization offers a improvement in search relevance (2.08% increase in NDCG@10) at the cost of a marginal increase in latency (0.50% to 0.72% across different percentiles).
146148

147149
**Consistency**:
148150

0 commit comments

Comments
 (0)