Summary
AnswerBuilder currently only handles individual document references (e.g. [1], [3]). It should also support ranges like [6-10], expanding them to the full set of indices {6, 7, 8, 9, 10}.
Current behaviour
_extract_reference_idxs uses re.findall with the user-supplied reference_pattern and calls int(idx) on each match. A range like [6-10] either doesn't match or fails to parse as an integer.
Expected behaviour
A reference of [6-10] should be treated as referencing documents 6, 7, 8, 9, and 10, equivalent to writing [6][7][8][9][10].
One Suggested implementation
Add an expand_reference_ranges: bool = False parameter to both __init__ and run(). It defaults to False for backwards compatibility. Otherwise users with custom reference_pattern values that legitimately capture strings containing - (e.g. "fig-3", "section-2") would otherwise get incorrect results silently.
_extract_reference_idxs gains the flag and handles both comma-separated parts and - ranges within each match:
@staticmethod
def _extract_reference_idxs(reply: str, reference_pattern: str, expand_ranges: bool = False) -> set[int]:
matches = re.findall(reference_pattern, reply)
idxs = set()
for match in matches:
if expand_ranges:
# we split on comma to handle cases like `[1-3,7-9]`
for part in match.split(","):
part = part.strip()
if "-" in part:
start, end = part.split("-", 1)
idxs.update(range(int(start) - 1, int(end)))
else:
idxs.add(int(part) - 1)
else:
idxs.add(int(match) - 1)
return idxs
When expand_reference_ranges=True the reference_pattern should also be updated to capture the broader form, e.g. \\[(\\d+(?:[,-]\\d+)*)\\].
👋 Hello there! This issue will be handled internally and isn't open for external contributions. If you'd like to contribute, please take a look at issues labeled contributions welcome or good first issue. We'd really appreciate it!
Summary
AnswerBuildercurrently only handles individual document references (e.g.[1],[3]). It should also support ranges like[6-10], expanding them to the full set of indices{6, 7, 8, 9, 10}.Current behaviour
_extract_reference_idxsusesre.findallwith the user-suppliedreference_patternand callsint(idx)on each match. A range like[6-10]either doesn't match or fails to parse as an integer.Expected behaviour
A reference of
[6-10]should be treated as referencing documents 6, 7, 8, 9, and 10, equivalent to writing[6][7][8][9][10].One Suggested implementation
Add an
expand_reference_ranges: bool = Falseparameter to both__init__andrun(). It defaults toFalsefor backwards compatibility. Otherwise users with customreference_patternvalues that legitimately capture strings containing-(e.g."fig-3","section-2") would otherwise get incorrect results silently._extract_reference_idxsgains the flag and handles both comma-separated parts and-ranges within each match:When
expand_reference_ranges=Truethereference_patternshould also be updated to capture the broader form, e.g.\\[(\\d+(?:[,-]\\d+)*)\\].👋 Hello there! This issue will be handled internally and isn't open for external contributions. If you'd like to contribute, please take a look at issues labeled contributions welcome or good first issue. We'd really appreciate it!