Skip to content

fix: TopPSampler silently skips filtering for integer scores and ignores top_p=0.0 passed to run() #11595

@Ayushhgit

Description

@Ayushhgit

Describe the bug
Two related bugs in TopPSampler cause it to silently skip filtering and return all documents, with no error:

1. Integer scores are treated as missing scores.

_get_doc_score only accepts float:

if not isinstance(score, float):
    score = None

A document with score=10 (an int, common when scores come from external rankers, APIs, or hand-built test data) is treated as if it had no score. When all scores are integers, the sampler logs "No documents with scores found" and returns the input unchanged — top-p filtering is silently disabled.

2. run(top_p=0.0) is silently ignored.

top_p = top_p or self.top_p

0.0 is falsy, so the per-call override falls back to the constructor value, even though 0.0 passes the component's own 0 <= top_p <= 1 validation and is meaningful ("select only the highest-scoring document", which is exactly what TopPSampler(top_p=0.0) does when set in the constructor). The same call via run() returns all documents instead of one.

Error message
No error. Both bugs result in unfiltered passthrough (bug 1 logs a misleading "missing scores" warning).

Expected behavior

  • Integer scores participate in top-p sampling exactly like floats (booleans should still be rejected, since bool is a subclass of int).
  • run(top_p=0.0) behaves the same as TopPSampler(top_p=0.0): returns the single highest-scoring document with the "resulted in no documents being selected" warning.

To Reproduce

from haystack import Document
from haystack.components.samplers import TopPSampler

# Bug 1: integer scores -> no filtering
sampler = TopPSampler(top_p=0.5)
docs = [Document(content="a", score=10), Document(content="b", score=1)]
print(len(sampler.run(documents=docs)["documents"]))  # 2 (expected: filtered)

# Bug 2: top_p=0.0 override ignored
sampler = TopPSampler(top_p=1.0)
docs = [Document(content="a", score=10.0), Document(content="b", score=1.0)]
print(len(sampler.run(documents=docs, top_p=0.0)["documents"]))  # 2 (expected: 1)

FAQ Check

System:

  • OS: Windows 11
  • Haystack version: main (2.x)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priority, leave it in the backlog

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions