Skip to content

feat: support reference ranges in AnswerBuilder (e.g. [6-10]) #11002

@sjrl

Description

@sjrl

Summary

AnswerBuilder currently only handles individual document references (e.g. [1], [3]). It should also support ranges like [6-10], expanding them to the full set of indices {6, 7, 8, 9, 10}.

Current behaviour

_extract_reference_idxs uses re.findall with the user-supplied reference_pattern and calls int(idx) on each match. A range like [6-10] either doesn't match or fails to parse as an integer.

Expected behaviour

A reference of [6-10] should be treated as referencing documents 6, 7, 8, 9, and 10, equivalent to writing [6][7][8][9][10].

One Suggested implementation

Add an expand_reference_ranges: bool = False parameter to both __init__ and run(). It defaults to False for backwards compatibility. Otherwise users with custom reference_pattern values that legitimately capture strings containing - (e.g. "fig-3", "section-2") would otherwise get incorrect results silently.

_extract_reference_idxs gains the flag and handles both comma-separated parts and - ranges within each match:

@staticmethod
def _extract_reference_idxs(reply: str, reference_pattern: str, expand_ranges: bool = False) -> set[int]:
    matches = re.findall(reference_pattern, reply)
    idxs = set()
    for match in matches:
        if expand_ranges:
            # we split on comma to handle cases like `[1-3,7-9]`
            for part in match.split(","):
                part = part.strip()
                if "-" in part:
                    start, end = part.split("-", 1)
                    idxs.update(range(int(start) - 1, int(end)))
                else:
                    idxs.add(int(part) - 1)
        else:
            idxs.add(int(match) - 1)
    return idxs

When expand_reference_ranges=True the reference_pattern should also be updated to capture the broader form, e.g. \\[(\\d+(?:[,-]\\d+)*)\\].


👋 Hello there! This issue will be handled internally and isn't open for external contributions. If you'd like to contribute, please take a look at issues labeled contributions welcome or good first issue. We'd really appreciate it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priority, leave it in the backlog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions