Skip to content

Commit 45d0002

Browse files
DocSum Long Context add auto mode (opea-project#1046)
* docsum refine mode promt update Signed-off-by: Xinyao Wang <[email protected]> * docsum vllm requirement update Signed-off-by: Xinyao Wang <[email protected]> * docsum add auto mode Signed-off-by: Xinyao Wang <[email protected]> * fix bug Signed-off-by: Xinyao Wang <[email protected]> * fix bug Signed-off-by: Xinyao Wang <[email protected]> * fix readme Signed-off-by: Xinyao Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine Signed-off-by: Xinyao Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Xinyao Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 717c3c1 commit 45d0002

File tree

7 files changed

+129
-48
lines changed

7 files changed

+129
-48
lines changed

comps/cores/proto/docarray.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ def chat_template_must_contain_variables(cls, v):
213213

214214

215215
class DocSumLLMParams(LLMParamsDoc):
216-
summary_type: str = "stuff" # can be "truncate", "map_reduce", "refine"
216+
summary_type: str = "auto" # can be "auto", "stuff", "truncate", "map_reduce", "refine"
217217
chunk_size: int = -1
218218
chunk_overlap: int = -1
219219

comps/llms/summarization/tgi/langchain/README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ In DocSum microservice, except for basic LLM parameters, we also support several
9898

9999
If you want to deal with long context, can select suitable summary type, details in section 3.2.2.
100100

101-
- "summary_type": can be "stuff", "truncate", "map_reduce", "refine", default is "stuff"
101+
- "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
102102
- "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
103103
- "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
104104

@@ -126,9 +126,13 @@ curl http://${your_ip}:9000/v1/chat/docsum \
126126

127127
#### 3.2.2 Long context summarization with "summary_type"
128128

129-
"summary_type" is set to be "stuff" by default, which will let LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
129+
**summary_type=auto**
130130

131-
When deal with long context, you can set "summary_type" to one of "truncate", "map_reduce" and "refine" for better performance.
131+
"summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
132+
133+
**summary_type=stuff**
134+
135+
In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
132136

133137
**summary_type=truncate**
134138

comps/llms/summarization/tgi/langchain/llm.py

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -42,26 +42,43 @@
4242
概况:"""
4343

4444

45-
templ_refine_en = """\
46-
Your job is to produce a final summary.
47-
We have provided an existing summary up to a certain point: {existing_answer}
48-
We have the opportunity to refine the existing summary (only if needed) with some more context below.
49-
------------
50-
{text}
51-
------------
52-
Given the new context, refine the original summary.
53-
If the context isn't useful, return the original summary.\
45+
templ_refine_en = """Your job is to produce a final summary.
46+
We have provided an existing summary up to a certain point, then we will provide more context.
47+
You need to refine the existing summary (only if needed) with new context and generate a final summary.
48+
49+
50+
Existing Summary:
51+
"{existing_answer}"
52+
53+
54+
55+
New Context:
56+
"{text}"
57+
58+
59+
60+
Final Summary:
61+
5462
"""
5563

5664
templ_refine_zh = """\
5765
你的任务是生成一个最终摘要。
58-
我们已经提供了部分摘要:{existing_answer}
59-
如果有需要的话,可以通过以下更多上下文来完善现有摘要。
60-
------------
61-
{text}
62-
------------
63-
根据新上下文,完善原始摘要。
64-
如果上下文无用,则返回原始摘要。\
66+
我们已经处理好部分文本并生成初始摘要, 并提供了新的未处理文本
67+
你需要根据新提供的文本,结合初始摘要,生成一个最终摘要。
68+
69+
70+
初始摘要:
71+
"{existing_answer}"
72+
73+
74+
75+
新的文本:
76+
"{text}"
77+
78+
79+
80+
最终摘要:
81+
6582
"""
6683

6784

@@ -76,6 +93,25 @@ async def llm_generate(input: DocSumLLMParams):
7693
if logflag:
7794
logger.info(input)
7895

96+
### check summary type
97+
summary_types = ["auto", "stuff", "truncate", "map_reduce", "refine"]
98+
if input.summary_type not in summary_types:
99+
raise NotImplementedError(f"Please specify the summary_type in {summary_types}")
100+
if input.summary_type == "auto": ### Check input token length in auto mode
101+
token_len = len(tokenizer.encode(input.query))
102+
if token_len > MAX_INPUT_TOKENS + 50:
103+
input.summary_type = "refine"
104+
if logflag:
105+
logger.info(
106+
f"Input token length {token_len} exceed MAX_INPUT_TOKENS + 50 {MAX_INPUT_TOKENS+50}, auto switch to 'refine' mode."
107+
)
108+
else:
109+
input.summary_type = "stuff"
110+
if logflag:
111+
logger.info(
112+
f"Input token length {token_len} not exceed MAX_INPUT_TOKENS + 50 {MAX_INPUT_TOKENS+50}, auto switch to 'stuff' mode."
113+
)
114+
79115
if input.language in ["en", "auto"]:
80116
templ = templ_en
81117
templ_refine = templ_refine_en
@@ -98,7 +134,7 @@ async def llm_generate(input: DocSumLLMParams):
98134
## Split text
99135
if input.summary_type == "stuff":
100136
text_splitter = CharacterTextSplitter()
101-
elif input.summary_type in ["truncate", "map_reduce", "refine"]:
137+
else:
102138
if input.summary_type == "refine":
103139
if MAX_TOTAL_TOKENS <= 2 * input.max_tokens + 128:
104140
raise RuntimeError("In Refine mode, Please set MAX_TOTAL_TOKENS larger than (max_tokens * 2 + 128)")
@@ -119,8 +155,7 @@ async def llm_generate(input: DocSumLLMParams):
119155
if logflag:
120156
logger.info(f"set chunk size to: {chunk_size}")
121157
logger.info(f"set chunk overlap to: {chunk_overlap}")
122-
else:
123-
raise NotImplementedError('Please specify the summary_type in "stuff", "truncate", "map_reduce", "refine"')
158+
124159
texts = text_splitter.split_text(input.query)
125160
docs = [Document(page_content=t) for t in texts]
126161
if logflag:

comps/llms/summarization/vllm/langchain/README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ In DocSum microservice, except for basic LLM parameters, we also support several
9797

9898
If you want to deal with long context, can select suitable summary type, details in section 3.2.2.
9999

100-
- "summary_type": can be "stuff", "truncate", "map_reduce", "refine", default is "stuff"
100+
- "summary_type": can be "auto", "stuff", "truncate", "map_reduce", "refine", default is "auto"
101101
- "chunk_size": max token length for each chunk. Set to be different default value according to "summary_type".
102102
- "chunk_overlap": overlap token length between each chunk, default is 0.1\*chunk_size
103103

@@ -125,9 +125,13 @@ curl http://${your_ip}:9000/v1/chat/docsum \
125125

126126
#### 3.2.2 Long context summarization with "summary_type"
127127

128-
"summary_type" is set to be "stuff" by default, which will let LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
128+
**summary_type=auto**
129129

130-
When deal with long context, you can set "summary_type" to one of "truncate", "map_reduce" and "refine" for better performance.
130+
"summary_type" is set to be "auto" by default, in this mode we will check input token length, if it exceed `MAX_INPUT_TOKENS`, `summary_type` will automatically be set to `refine` mode, otherwise will be set to `stuff` mode.
131+
132+
**summary_type=stuff**
133+
134+
In this mode LLM generate summary based on complete input text. In this case please carefully set `MAX_INPUT_TOKENS` and `MAX_TOTAL_TOKENS` according to your model and device memory, otherwise it may exceed LLM context limit and raise error when meet long context.
131135

132136
**summary_type=truncate**
133137

comps/llms/summarization/vllm/langchain/llm.py

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -43,26 +43,43 @@
4343
概况:"""
4444

4545

46-
templ_refine_en = """\
47-
Your job is to produce a final summary.
48-
We have provided an existing summary up to a certain point: {existing_answer}
49-
We have the opportunity to refine the existing summary (only if needed) with some more context below.
50-
------------
51-
{text}
52-
------------
53-
Given the new context, refine the original summary.
54-
If the context isn't useful, return the original summary.\
46+
templ_refine_en = """Your job is to produce a final summary.
47+
We have provided an existing summary up to a certain point, then we will provide more context.
48+
You need to refine the existing summary (only if needed) with new context and generate a final summary.
49+
50+
51+
Existing Summary:
52+
"{existing_answer}"
53+
54+
55+
56+
New Context:
57+
"{text}"
58+
59+
60+
61+
Final Summary:
62+
5563
"""
5664

5765
templ_refine_zh = """\
5866
你的任务是生成一个最终摘要。
59-
我们已经提供了部分摘要:{existing_answer}
60-
如果有需要的话,可以通过以下更多上下文来完善现有摘要。
61-
------------
62-
{text}
63-
------------
64-
根据新上下文,完善原始摘要。
65-
如果上下文无用,则返回原始摘要。\
67+
我们已经处理好部分文本并生成初始摘要, 并提供了新的未处理文本
68+
你需要根据新提供的文本,结合初始摘要,生成一个最终摘要。
69+
70+
71+
初始摘要:
72+
"{existing_answer}"
73+
74+
75+
76+
新的文本:
77+
"{text}"
78+
79+
80+
81+
最终摘要:
82+
6683
"""
6784

6885

@@ -77,6 +94,25 @@ async def llm_generate(input: DocSumLLMParams):
7794
if logflag:
7895
logger.info(input)
7996

97+
### check summary type
98+
summary_types = ["auto", "stuff", "truncate", "map_reduce", "refine"]
99+
if input.summary_type not in summary_types:
100+
raise NotImplementedError(f"Please specify the summary_type in {summary_types}")
101+
if input.summary_type == "auto": ### Check input token length in auto mode
102+
token_len = len(tokenizer.encode(input.query))
103+
if token_len > MAX_INPUT_TOKENS + 50:
104+
input.summary_type = "refine"
105+
if logflag:
106+
logger.info(
107+
f"Input token length {token_len} exceed MAX_INPUT_TOKENS + 50 {MAX_INPUT_TOKENS+50}, auto switch to 'refine' mode."
108+
)
109+
else:
110+
input.summary_type = "stuff"
111+
if logflag:
112+
logger.info(
113+
f"Input token length {token_len} not exceed MAX_INPUT_TOKENS + 50 {MAX_INPUT_TOKENS+50}, auto switch to 'stuff' mode."
114+
)
115+
80116
if input.language in ["en", "auto"]:
81117
templ = templ_en
82118
templ_refine = templ_refine_en
@@ -99,7 +135,7 @@ async def llm_generate(input: DocSumLLMParams):
99135
## Split text
100136
if input.summary_type == "stuff":
101137
text_splitter = CharacterTextSplitter()
102-
elif input.summary_type in ["truncate", "map_reduce", "refine"]:
138+
else:
103139
if input.summary_type == "refine":
104140
if MAX_TOTAL_TOKENS <= 2 * input.max_tokens + 128:
105141
raise RuntimeError("In Refine mode, Please set MAX_TOTAL_TOKENS larger than (max_tokens * 2 + 128)")
@@ -120,8 +156,7 @@ async def llm_generate(input: DocSumLLMParams):
120156
if logflag:
121157
logger.info(f"set chunk size to: {chunk_size}")
122158
logger.info(f"set chunk overlap to: {chunk_overlap}")
123-
else:
124-
raise NotImplementedError('Please specify the summary_type in "stuff", "truncate", "map_reduce", "refine"')
159+
125160
texts = text_splitter.split_text(input.query)
126161
docs = [Document(page_content=t) for t in texts]
127162
if logflag:

comps/llms/summarization/vllm/langchain/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
docarray[full]
22
fastapi
3+
httpx==0.27.2
34
huggingface_hub
45
langchain #==0.1.12
56
langchain-huggingface

tests/llms/test_llms_summarization_tgi_langchain.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Copyright (C) 2024 Intel Corporation
33
# SPDX-License-Identifier: Apache-2.0
44

5-
set -x
5+
set -xe
66

77
WORKPATH=$(dirname "$PWD")
88
ip_address=$(hostname -I | awk '{print $1}')
@@ -30,7 +30,7 @@ function start_service() {
3030
export TGI_LLM_ENDPOINT="http://${ip_address}:${tgi_endpoint_port}"
3131

3232
sum_port=5076
33-
docker run -d --name="test-comps-llm-sum-tgi-server" -p ${sum_port}:9000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TGI_LLM_ENDPOINT=$TGI_LLM_ENDPOINT -e LLM_MODEL_ID=$LLM_MODEL_ID -e MAX_INPUT_TOKENS=$MAX_INPUT_TOKENS -e MAX_TOTAL_TOKENS=$MAX_TOTAL_TOKENS -e HUGGINGFACEHUB_API_TOKEN=$HF_TOKEN opea/llm-sum-tgi:comps
33+
docker run -d --name="test-comps-llm-sum-tgi-server" -p ${sum_port}:9000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TGI_LLM_ENDPOINT=$TGI_LLM_ENDPOINT -e LLM_MODEL_ID=$LLM_MODEL_ID -e MAX_INPUT_TOKENS=$MAX_INPUT_TOKENS -e MAX_TOTAL_TOKENS=$MAX_TOTAL_TOKENS -e HUGGINGFACEHUB_API_TOKEN=$HF_TOKEN -e LOGFLAG=True opea/llm-sum-tgi:comps
3434

3535
# check whether tgi is fully ready
3636
n=0
@@ -61,10 +61,12 @@ function validate_services() {
6161

6262
local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)
6363

64+
echo $CONTENT
65+
6466
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
6567
echo "[ $SERVICE_NAME ] Content is as expected."
6668
else
67-
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
69+
echo "[ $SERVICE_NAME ] Content does not match the expected result"
6870
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
6971
exit 1
7072
fi

0 commit comments

Comments
 (0)