Reorder by XKTZ · Pull Request #143 · castorini/rank_llm

XKTZ · 2024-09-15T20:09:46Z

This is a superset of issue Top Down. This PR reorganized the various reordering methods including sliding window, top down, as well as ListT5's tournament sort methodology. Now it is allowed to use command like --reorder_policy="top_down:{\"top_k\": 10, \"pivot\": ${PIVOT}, \"shuffle\": true, \"r\": 1}" to specify a reorderer.

What changes

Incompatibility

In RankListwiseOSLLM, the create prompt now returns a JSON string, then in run llm batched, it would parse it into a passage with type { prompt: string, num_passages: int }. This is for backward compatibility with the old data types of prompt (str | list[dict[str,str]]) while we need num passages in a single trial.

Reorder Update

Now, the reorder and reranker are splitted for listwise_rankllm. That means, sliding_window(_batched) and permutation_pipeline shall be deprecated. Instead, we can define a class called ReorderPolicy to perform the reranking. So this is a small (simplified) definition on what an reorder policy is

class ReorderPolicy where
    reorder :: ([Requests], LLMReranker) -> [Result]

For detailed definition, please check src/rank_llm/rerank/listwise/reorder package

Now, we have three reorder policies, Sliding Window (sliding_window), tournament sort (tournament_sort) and top down (top_down). To add a reorder policy in future, create such a class extends Reorder Policy with a name, and put your class into listwise_rankllm.py's SUPPORT_REORDER_POLICIES.

To pass a parameter into reorder policy, append a JSON, separated by : when passing into the command line arguments reorder_policy, an example is following:

--reorder_policy="top_down:{\"pivot\": 11, \"top_k\": 10}"

Notice that to convenience, the JSON string behind : will be processed with json-repair library. So the string might not neet to be very JSON. (A javascript version should work, so you don't need to add a lot of \", using single quote ' should also work, so you don't need to put \" instead only need \')

So in above case, you actually only need to pass

--reorder_policy="top_down:{pivot: 11, top_k: 10}"

The JSON will be parsed as **json_dict to pass into the policy's class

Pointwise LLM

Adjust Pointwise LLM to compatible with the type usage right now (Not MonoT5, MonoT5's type def is still not corresponding with base class')

Experiments

Transformers

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpr21z6o83', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_slidingwindow_stp10_2025-01-03T22:33:59.620645_window_20.txt']
Results:
ndcg_cut_10           	all	0.8211

VLLM

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --vllm_batched
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpx7i37yy_', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_slidingwindow_stp10_2025-01-03T21:59:23.872518_window_20_vllm.txt']
Results:
ndcg_cut_10             all     0.8065

SGLang

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --sglang_batched
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpacfdy6ub', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_slidingwindow_stp10_2025-01-03T21:54:34.718335_window_20_sglang.txt']
Results:
ndcg_cut_10             all     0.8085

TensorRT

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --tensorrt_batched
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpf0t2uee6', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_slidingwindow_stp10_2025-01-04T20:28:13.770028_window_20_tensorrt.txt']
Results:
ndcg_cut_10             all     0.8113

VLLM, with top down policy

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --vllm_batched --reorder_policy="top_down:{pivot: 11, top_k: 10}"
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmp48wzevjb', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_topdown_tpk10_pvt11_2025-01-04T00:38:58.718882_window_20_vllm.txt']
Results:
ndcg_cut_10             all     0.7968

VLLM, with tournament sort policy

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --vllm_batched --reorder_policy="tournament_sort:{top_k: 10, r: 1}"
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpcznevsch', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_tournamentsort_tpk10_r1_2025-01-04T00:52:37.214645_window_20_vllm.txt']
Results:
ndcg_cut_10             all     0.8012

VLLM, with FIRST logit's method, and sliding window

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/first_mistral --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --use_logits --use_alpha --vllm_batched --num_gpus 1
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmp6tzqimq3', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/first_mistral_4096_100_rank_GPT_dl20_slidingwindow_stp10_2025-01-04T01:02:51.749029_window_20_vllm.txt']
Results:
ndcg_cut_10             all     0.7885

VLLM, with FIRST logit's, and top down

python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/first_mistral --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --use_logits --use_alpha --vllm_batched --num_gpus 1 --reorder_policy="top_down:{top_k: 10, pivot: 11}"
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpttqpc9ew', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/first_mistral_4096_100_rank_GPT_dl20_topdown_tpk10_pvt11_2025-01-04T01:35:06.172132_window_20_vllm.txt']
Results:
ndcg_cut_10             all     0.7856

MonoT5

python src/rank_llm/scripts/run_rank_llm.py --model_path=castorini/monot5-3b-msmarco-10k --top_k_candidates=100 --dataset=dl19   --retrieval_method=bm25 --prompt_mode=monot5 --context_size=512 --batch_size=16
Running command: ['java', '-jar', '/home/yidic/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmpxcjujitq', 'rerank_results/BM25/monot5-3b-msmarco-10k_512_100_monot5_dl19_2025-01-04T21:09:53.071527_window_20.txt']
Results:
ndcg_cut_10             all     0.7174

…itude rank_start and rank_end

…a permutation but now it is returning a string. Will need the cleanup pipeline in next step

Update

upd

merge

ronakice · 2024-09-16T14:29:51Z

README.md

+
+

…eate_prompt

ronakice · 2024-10-19T01:31:08Z

src/rank_llm/scripts/run_rank_llm.py

        "--step_size",
        type=int,
-        default=10,
-        help="step size for the sliding window approach",
+        default=20,
+        help="window size for the LLM",
    )


step and window are different?

ronakice · 2025-01-05T14:02:27Z

README.md

+    --reorder_policy="tournament_sort:{top_k: 10, r: 1}"
+```
+
+### Run end to end - Rank Zephyr with [Top Down](https://)


@XKTZ what is this?

ronakice · 2025-01-05T14:03:12Z

README.md

 }
 ```

+If you woud like to cite the ListT5's tournament sort methodology, please consider citing


I don't think this is the first instance of tournament sort, something has probably been done before, maybe we drop it for now

XKTZ and others added 26 commits August 13, 2024 00:33

First step update - during create_prompt, use selected_index to subst…

60d5e7f

…itude rank_start and rank_end

Allow selected index to be variable on each step

74118c4

Added basic _get_model_function - it is wrong now, we want to return …

36bbcdc

…a permutation but now it is returning a string. Will need the cleanup pipeline in next step

Renamed indexes to indices, [indices] is renamed to indices_batch

6d2094a

Renamed ReorderExecutor to Reorder Policy

cc78522

Moved LiT5Distill to policy

4936c8c

Transition rank listwise os llm to using reorder policy

95f065a

Added reorder policy for rank listwise os and fid score

438c8bd

Revised bug in OS LLM, add Rank GPT, deprecated old functions

2aa8644

Finish the tournament sort node

7a19b87

Finish tournament sort

c8734f0

Finish reorganize of parameters, move window_size to ListwiseRankLLM

7981c06

Added window size back

bbbdc31

Fix Rerankers

bb0ad2b

Merge pull request #4 from castorini/main

62785bc

Update

Merge branch 'reorder' into main

9b1972a

Merge pull request #5 from XKTZ/main

06b9f1f

Update

Added r parameter

28e3a23

Some bug fix

0e9edce

Reformatted

3005455

Added top down

3795308

Merge pull request #6 from castorini/main

edd1ff9

upd

Merge pull request #7 from XKTZ/main

89bab7c

merge

Added final some stuff

65d3508

Updated readme

69e3d65

Clean the README

8f4f729

ronakice reviewed Sep 16, 2024

View reviewed changes

README.md

Comment on lines 206 to 207

Copy link

Member

ronakice Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

XKTZ added 3 commits September 16, 2024 16:41

Changed filename

cc22c95

Updated silence

26beeed

Fixed bug when batch size > 32, which is in rank_listwise_os_llm's cr…

36beed1

…eate_prompt

ronakice reviewed Oct 19, 2024

View reviewed changes

XKTZ added 9 commits October 19, 2024 04:04

Put back step size for maintaining backward compatibility

9caa5fb

Support early stop in topdown

d7ecc8b

Reformat

891ca8f

temp .gitignore

db3a509

Add parameter include padding,

743b5af

Merge remote-tracking branch 'origin/main' into reorder

cca845e

Format

66c8742

Add more fix

acf8012

Fix bug

f595054

XKTZ force-pushed the reorder branch from 2d03c8b to f595054 Compare January 3, 2025 23:00

XKTZ added 9 commits January 3, 2025 23:50

Merge remote-tracking branch 'rankllm/main' into reorder

94a5aa5

update

8d0ea17

Merge remote-tracking branch 'rankllm/main' into reorder

ba2c3ef

Fix TensorRT

ca0a1f0

Fix gitignore

2efdbd5

Fix import

dc9fdbf

Added Readme

4669a3f

MonoT5 type def

2a38464

Let RankFID able to fit the type def

75c4a74

ronakice reviewed Jan 5, 2025

View reviewed changes

XKTZ added 8 commits January 17, 2025 00:12

Remove Tournament sort method with ListT5

0b0426d

Merge remote-tracking branch 'rankllm/main' into reorder

fbb163c

Add a comment

e5842a2

lint

70f3914

Update rankfid

4944927

Add some comments

0d5c5e7

Merge branch 'reorder' of github.com:XKTZ/rank_llm into reorder

8b0f963

Updated

1115247

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorder#143

Reorder#143
XKTZ wants to merge 60 commits intocastorini:mainfrom
XKTZ:reorder

XKTZ commented Sep 15, 2024 •

edited

Loading

Uh oh!

ronakice Sep 16, 2024

Uh oh!

ronakice Oct 19, 2024

Uh oh!

ronakice Jan 5, 2025

Uh oh!

ronakice Jan 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

XKTZ commented Sep 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes

Incompatibility

Reorder Update

Pointwise LLM

Experiments

Transformers

VLLM

SGLang

TensorRT

VLLM, with top down policy

VLLM, with tournament sort policy

VLLM, with ** FIRST ** logit's method, and sliding window

VLLM, with ** FIRST ** logit's, and top down

MonoT5

Uh oh!

ronakice Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

ronakice Oct 19, 2024

Choose a reason for hiding this comment

Uh oh!

ronakice Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

ronakice Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XKTZ commented Sep 15, 2024 •

edited

Loading

VLLM, with FIRST logit's method, and sliding window

VLLM, with FIRST logit's, and top down