Describe the bug
Two related bugs in TopPSampler cause it to silently skip filtering and return all documents, with no error:
1. Integer scores are treated as missing scores.
_get_doc_score only accepts float:
if not isinstance(score, float):
score = None
A document with score=10 (an int, common when scores come from external rankers, APIs, or hand-built test data) is treated as if it had no score. When all scores are integers, the sampler logs "No documents with scores found" and returns the input unchanged — top-p filtering is silently disabled.
2. run(top_p=0.0) is silently ignored.
top_p = top_p or self.top_p
0.0 is falsy, so the per-call override falls back to the constructor value, even though 0.0 passes the component's own 0 <= top_p <= 1 validation and is meaningful ("select only the highest-scoring document", which is exactly what TopPSampler(top_p=0.0) does when set in the constructor). The same call via run() returns all documents instead of one.
Error message
No error. Both bugs result in unfiltered passthrough (bug 1 logs a misleading "missing scores" warning).
Expected behavior
- Integer scores participate in top-p sampling exactly like floats (booleans should still be rejected, since
bool is a subclass of int).
run(top_p=0.0) behaves the same as TopPSampler(top_p=0.0): returns the single highest-scoring document with the "resulted in no documents being selected" warning.
To Reproduce
from haystack import Document
from haystack.components.samplers import TopPSampler
# Bug 1: integer scores -> no filtering
sampler = TopPSampler(top_p=0.5)
docs = [Document(content="a", score=10), Document(content="b", score=1)]
print(len(sampler.run(documents=docs)["documents"])) # 2 (expected: filtered)
# Bug 2: top_p=0.0 override ignored
sampler = TopPSampler(top_p=1.0)
docs = [Document(content="a", score=10.0), Document(content="b", score=1.0)]
print(len(sampler.run(documents=docs, top_p=0.0)["documents"])) # 2 (expected: 1)
FAQ Check
System:
- OS: Windows 11
- Haystack version: main (2.x)
Describe the bug
Two related bugs in
TopPSamplercause it to silently skip filtering and return all documents, with no error:1. Integer scores are treated as missing scores.
_get_doc_scoreonly acceptsfloat:A document with
score=10(anint, common when scores come from external rankers, APIs, or hand-built test data) is treated as if it had no score. When all scores are integers, the sampler logs "No documents with scores found" and returns the input unchanged — top-p filtering is silently disabled.2.
run(top_p=0.0)is silently ignored.0.0is falsy, so the per-call override falls back to the constructor value, even though0.0passes the component's own0 <= top_p <= 1validation and is meaningful ("select only the highest-scoring document", which is exactly whatTopPSampler(top_p=0.0)does when set in the constructor). The same call viarun()returns all documents instead of one.Error message
No error. Both bugs result in unfiltered passthrough (bug 1 logs a misleading "missing scores" warning).
Expected behavior
boolis a subclass ofint).run(top_p=0.0)behaves the same asTopPSampler(top_p=0.0): returns the single highest-scoring document with the "resulted in no documents being selected" warning.To Reproduce
FAQ Check
System: