Skip to content

fix(custom): escape pipes in r1_searcher_query_extract tag regex#439

Open
Chen17-sq wants to merge 1 commit into
OpenBMB:mainfrom
Chen17-sq:fix/r1-query-extract-regex
Open

fix(custom): escape pipes in r1_searcher_query_extract tag regex#439
Chen17-sq wants to merge 1 commit into
OpenBMB:mainfrom
Chen17-sq:fix/r1-query-extract-regex

Conversation

@Chen17-sq

Copy link
Copy Markdown

Bug

r1_searcher_query_extract extracts the wrong text because its tag regex has unescaped pipes:

pattern = re.compile(r"<|begin_of_query|>([^<]*)", re.DOTALL)

In a regex | is alternation, so this matches < or begin_of_query or >([^<]*) — not the literal <|begin_of_query|> tag the function intends (and that its docstring documents).

Reproduction

text = "Reasoning... <|begin_of_query|>capital of France<|end_of_query|> done"
r1_searcher_query_extract([text])
# before: {'extract_query_list': ['done?']}              <- trailing text, wrong
# after:  {'extract_query_list': ['capital of France?']}

findall on the buggy pattern returns ['', '', 'capital of France', '', ' done'], and get_query takes matches[-1]' done', so the extracted "query" is whatever trails the tag.

Fix

Escape the pipes so the literal tag is matched:

pattern = re.compile(r"<\|begin_of_query\|>([^<]*)", re.DOTALL)

The sibling <search> extractor is unaffected (no pipes).

Tests

Adds tests/servers/custom/test_query_extract.py covering correct extraction, last-query selection, the ? suffix, and the no-tag fallback. pytest is already declared as a dev dependency; these are the first tests in the repo. All 4 pass locally.

The tag regex used unescaped pipes: re.compile(r"<|begin_of_query|>([^<]*)").
In a regex `|` is alternation, so the pattern matched `<` OR `begin_of_query`
OR `>([^<]*)` instead of the literal tag `<|begin_of_query|>`. The extractor
therefore returned trailing text after the query rather than the query itself.

For "... <|begin_of_query|>capital of France<|end_of_query|> done" get_query()
returned "done?" instead of "capital of France?".

Escape the pipes so the literal tag is matched, and add the first test suite for
the custom server covering this extractor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant