Skip to content

Commit 8210f32

Browse files
fix(search): parse RESP3 FT.SEARCH responses with bytes-typed keys (#4109)
* fix(search): parse RESP3 FT.SEARCH responses with bytes-typed keys Since the wire protocol default switched to RESP3 (#4052), the server returns FT.SEARCH responses as RESP3 maps. When a client is opened with ``decode_responses=False`` the map keys arrive as ``bytes`` rather than ``str``, but ``Result.from_resp3`` looked them up as plain strings: instance.total = res.get("total_results", 0) for result_item in res.get("results", []): ... Because ``b"total_results" != "total_results"``, every lookup missed and the search appeared to return ``Result{0 total, docs: []}`` even though the server had matched documents. Normalise the top-level map and each per-result map to string keys before reading them, mirroring the pattern already used by ``_parse_hybrid_search_resp3`` in ``redis/commands/search/commands.py`` ("Top-level keys are normalised to strings"). Adds ``tests/test_search_result.py`` with regression tests covering str-keyed, bytes-keyed, and mixed maps, plus the empty/None edge cases. The tests fail on the unfixed code for the bytes and mixed cases. Fixes #4107 * fix(search): extend bytes-key normalisation to AGGREGATE and SPELLCHECK The RESP3 callbacks for FT.SEARCH (`Result.from_resp3`) were taught to normalise top-level structural map keys to strings so that responses parsed correctly on connections opened with `decode_responses=False`. `_parse_aggregate_resp3` (FT.AGGREGATE / FT.CURSOR READ / FT.PROFILE AGGREGATE) and `_parse_spellcheck_resp3` (FT.SPELLCHECK) still read `"total_results"`, `"results"` and `"warning"` as plain strings, so a byte-keyed RESP3 response missed every lookup and silently parsed as an empty AggregateResult / `{}` even when the server had returned data. Apply the same `str_if_bytes` normalisation that `Result.from_resp3` and `_parse_hybrid_search_resp3` already use: - normalise the top-level map and (for aggregate) the per-result-item map; document data inside `extra_attributes` is left as-is so the caller still sees bytes when `decode_responses=False`, mirroring the RESP2 shape; - normalise the outer `results` key for spellcheck; the inner term keys match the RESP2 `decode_responses=False` shape and stay as bytes. Adds regression tests for both parsers in `tests/test_search_result.py`, plus integration tests in `tests/test_search.py` that exercise the three affected Search callbacks (FT.SEARCH, FT.AGGREGATE, FT.SPELLCHECK) against a real RESP3 wire with a `decode_responses=False` client. * fix(search): apply petyaslavova review feedback - _parse_spellcheck_resp3 now preserves the suggestion value as-is so it keeps the decode_responses shape RESP2 would produce (str when decoded, bytes otherwise) instead of wrapping bytes in str(). - waitForIndex now accepts both str and bytes structural keys in FT.INFO responses. execute_command bypasses the search module's callbacks, so the helper has to handle the raw RESP3 dict/RESP2 list shapes for decode_responses=False clients. This unblocks the previously failing fixed-clients CI matrix entry. - The bytes-keys integration tests are now parametrised over protocol=2 (anchors the legacy output shape) and the default protocol (the path that actually exercises the changed parsers in _RedisCallbacksRESP3toRESP2Legacy). Explicit protocol=3 was routing through _RedisCallbacksRESP3 and bypassing the fix. - Spellcheck assertion is stricter: it pins the term key to b"impornant" and the suggestion value to b"important". - Mirror the suggestion-bytes assertion in the test_search_result.py unit test. * test(search): tidy review nits in RESP3 bytes-key tests - Rephrase the parametrisation comment so it explains *why* the two protocol arms exist in terms of the parsers being exercised (_parse_search_resp3 / _parse_aggregate_resp3 / _parse_spellcheck_resp3) rather than "the changed methods", which was only meaningful relative to this PR's diff. - Reorder decorators on TestSearchResp3BytesKeys methods so the test-scope marks (redismod, fixed_client) stay grouped and @pytest.mark.parametrize sits last, matching the prevailing style for parametrised tests. --------- Co-authored-by: petyaslavova <petya.slavova@redis.com>
1 parent 2febde0 commit 8210f32

4 files changed

Lines changed: 405 additions & 5 deletions

File tree

redis/commands/search/commands.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -494,11 +494,19 @@ def _parse_aggregate_resp3(self, res, **kwargs):
494494
else:
495495
data = res
496496

497+
if data is None:
498+
data = {}
499+
# On RESP3 connections with decode_responses=False the server's map
500+
# keys arrive as bytes, so normalise structural keys to strings
501+
# before lookup. Mirrors ``Result.from_resp3``.
502+
data = {str_if_bytes(k): v for k, v in data.items()}
503+
497504
warnings = [str_if_bytes(w) for w in data.get("warning", [])]
498505
total = data.get("total_results", 0)
499506

500507
rows = []
501508
for result_item in data.get("results", []):
509+
result_item = {str_if_bytes(k): v for k, v in result_item.items()}
502510
extra_attrs = result_item.get("extra_attributes", {})
503511
# Convert dict to flat list [key, value, key, value, ...]
504512
# to match RESP2 row format consumers expect.
@@ -640,6 +648,10 @@ def _parse_spellcheck_resp3(self, res, **kwargs):
640648
"""
641649
if not isinstance(res, dict):
642650
return self._parse_spellcheck(res, **kwargs)
651+
# On RESP3 connections with decode_responses=False the server's map
652+
# keys arrive as bytes, so normalise the structural ``results`` key
653+
# to a string before lookup. Mirrors ``Result.from_resp3``.
654+
res = {str_if_bytes(k): v for k, v in res.items()}
643655
corrections = {}
644656
results = res.get("results", {})
645657
for term, suggestions in results.items():
@@ -654,8 +666,11 @@ def _parse_spellcheck_resp3(self, res, **kwargs):
654666
score_str = str(score)
655667
if score_str.endswith(".0"):
656668
score_str = score_str[:-2]
669+
# Preserve ``suggestion`` as-is so it keeps the
670+
# ``decode_responses`` shape RESP2 would produce
671+
# (``str`` when decoded, ``bytes`` otherwise).
657672
term_corrections.append(
658-
{"score": score_str, "suggestion": str(suggestion)}
673+
{"score": score_str, "suggestion": suggestion}
659674
)
660675
if term_corrections:
661676
corrections[term] = term_corrections

redis/commands/search/result.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,12 +111,17 @@ def from_resp3(
111111
instance = cls.__new__(cls)
112112
if res is None:
113113
res = {}
114+
# On RESP3 connections with decode_responses=False the server's map
115+
# keys arrive as bytes, so normalise them to strings before lookup
116+
# to keep behaviour consistent with decode_responses=True.
117+
res = {str_if_bytes(k): v for k, v in res.items()}
114118
instance.total = res.get("total_results", 0)
115119
instance.duration = duration
116120
instance.docs = []
117121
instance.warnings = [str_if_bytes(w) for w in res.get("warning", [])]
118122

119123
for result_item in res.get("results", []):
124+
result_item = {str_if_bytes(k): v for k, v in result_item.items()}
120125
doc_id = str_if_bytes(result_item.get("id", ""))
121126
score = None
122127
if with_scores and "score" in result_item:

tests/test_search.py

Lines changed: 129 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
)
4444
from redis.commands.search.result import Result
4545
from redis.commands.search.suggestion import Suggestion
46-
from redis.utils import safe_str
46+
from redis.utils import SENTINEL, safe_str
4747

4848
from .conftest import (
4949
_get_client,
@@ -84,15 +84,28 @@ def waitForIndex(env, idx, timeout=None):
8484
while True:
8585
try:
8686
res = env.execute_command("FT.INFO", idx)
87-
if int(res[res.index("indexing") + 1]) == 0:
87+
# ``execute_command`` bypasses the search module's
88+
# callbacks, so the response is the raw wire shape.
89+
# With ``decode_responses=False`` the structural keys
90+
# arrive as bytes; accept both forms.
91+
try:
92+
i = res.index("indexing")
93+
except ValueError:
94+
i = res.index(b"indexing")
95+
if int(res[i + 1]) == 0:
8896
break
8997
except ValueError:
9098
break
9199
except AttributeError:
100+
# RESP3 dict response. Keys may be ``str`` or ``bytes``
101+
# depending on ``decode_responses``.
102+
indexing = res.get("indexing")
103+
if indexing is None:
104+
indexing = res.get(b"indexing")
92105
try:
93-
if int(res["indexing"]) == 0:
106+
if int(indexing) == 0:
94107
break
95-
except ValueError:
108+
except (TypeError, ValueError):
96109
break
97110
except ResponseError:
98111
# index doesn't exist yet
@@ -5590,3 +5603,115 @@ def test_hybrid_search_query_with_multiple_loads_and_applies(self, client):
55905603
assert item["description"] is not None
55915604
assert item["discount_10_percents"] is not None
55925605
assert item["additional_discount"] is not None
5606+
5607+
5608+
# Parametrise the bytes-key regression tests over RESP2 and the
5609+
# default protocol. RESP2 uses ``_RedisCallbacksRESP2`` and anchors the
5610+
# expected legacy output shape with ``decode_responses=False``. The
5611+
# default protocol (``SENTINEL`` -> not specified) leaves the wire on
5612+
# RESP3 with the ``_RedisCallbacksRESP3toRESP2Legacy`` adapter selected,
5613+
# which is where the bytes-key normalisation in ``_parse_search_resp3``,
5614+
# ``_parse_aggregate_resp3`` and ``_parse_spellcheck_resp3`` lives. An
5615+
# explicit ``protocol=3`` would route through ``_RedisCallbacksRESP3``
5616+
# instead and bypass the methods we want to test.
5617+
_SEARCH_BYTES_PROTOCOLS = [
5618+
pytest.param(2, id="resp2"),
5619+
pytest.param(SENTINEL, id="default-resp3"),
5620+
]
5621+
5622+
5623+
def _make_bytes_search_client(request, stack_url, protocol):
5624+
kwargs = {
5625+
"decode_responses": False,
5626+
"from_url": stack_url,
5627+
}
5628+
if protocol is not SENTINEL:
5629+
kwargs["protocol"] = protocol
5630+
client = _get_client(redis.Redis, request, **kwargs)
5631+
client.flushdb()
5632+
return client
5633+
5634+
5635+
class TestSearchResp3BytesKeys(SearchTestsBase):
5636+
"""Regression tests for #4107.
5637+
5638+
With the default protocol (RESP3 on the wire) and
5639+
``decode_responses=False`` the server's structural map keys arrive
5640+
as ``bytes`` (e.g. ``b"results"`` rather than ``"results"``). The
5641+
RESP3->legacy-RESP2 search callbacks used to look those keys up as
5642+
plain strings, missed them, and silently produced empty results.
5643+
Each test is parametrised over ``protocol=2`` (anchors the legacy
5644+
output shape) and the default protocol (exercises the actual fixed
5645+
parsers in ``_RedisCallbacksRESP3toRESP2Legacy``).
5646+
"""
5647+
5648+
@pytest.mark.redismod
5649+
@pytest.mark.fixed_client
5650+
@pytest.mark.parametrize("protocol", _SEARCH_BYTES_PROTOCOLS)
5651+
def test_search_resp3_bytes_keys(self, request, stack_url, protocol):
5652+
client = _make_bytes_search_client(request, stack_url, protocol)
5653+
client.ft().create_index((TextField("title"), TextField("body")))
5654+
client.hset("doc1", mapping={"title": "hello", "body": "redis world"})
5655+
client.hset("doc2", mapping={"title": "hello", "body": "search world"})
5656+
self.waitForIndex(client, getattr(client.ft(), "index_name", "idx"))
5657+
5658+
res = client.ft().search(Query("hello"))
5659+
5660+
# Before the fix the default-RESP3 case returned
5661+
# ``Result{0 total, docs: []}`` because ``Result.from_resp3``
5662+
# looked up the bytes-keyed map by ``str`` keys. ``Result``
5663+
# always normalises the doc id with ``str_if_bytes`` so the id
5664+
# assertion stays in ``str`` form even with
5665+
# ``decode_responses=False``.
5666+
assert res.total == 2
5667+
assert {d.id for d in res.docs} == {"doc1", "doc2"}
5668+
5669+
@pytest.mark.redismod
5670+
@pytest.mark.fixed_client
5671+
@pytest.mark.parametrize("protocol", _SEARCH_BYTES_PROTOCOLS)
5672+
def test_aggregate_resp3_bytes_keys(self, request, stack_url, protocol):
5673+
client = _make_bytes_search_client(request, stack_url, protocol)
5674+
client.ft().create_index((TextField("title"), TextField("parent")))
5675+
client.hset("doc1", mapping={"title": "alpha", "parent": "redis"})
5676+
client.hset("doc2", mapping={"title": "beta", "parent": "redis"})
5677+
client.hset("doc3", mapping={"title": "gamma", "parent": "redis"})
5678+
self.waitForIndex(client, getattr(client.ft(), "index_name", "idx"))
5679+
5680+
req = aggregations.AggregateRequest("redis").group_by(
5681+
"@parent", reducers.count()
5682+
)
5683+
res = client.ft().aggregate(req)
5684+
5685+
# Before the fix the default-RESP3 case missed
5686+
# ``data.get("total_results")`` and ``data.get("results")``
5687+
# because the keys were ``bytes``, yielding ``total=0`` and
5688+
# ``rows=[]``.
5689+
assert len(res.rows) == 1
5690+
row = res.rows[0]
5691+
# Row content stays as bytes (matches RESP2 with
5692+
# decode_responses=False); only the structural map keys are
5693+
# normalised.
5694+
assert b"parent" in row
5695+
assert b"redis" in row
5696+
5697+
@pytest.mark.redismod
5698+
@pytest.mark.fixed_client
5699+
@pytest.mark.parametrize("protocol", _SEARCH_BYTES_PROTOCOLS)
5700+
def test_spellcheck_resp3_bytes_keys(self, request, stack_url, protocol):
5701+
client = _make_bytes_search_client(request, stack_url, protocol)
5702+
client.ft().create_index((TextField("f1"),))
5703+
client.hset("doc1", mapping={"f1": "some valid content"})
5704+
client.hset("doc2", mapping={"f1": "very important"})
5705+
self.waitForIndex(client, getattr(client.ft(), "index_name", "idx"))
5706+
5707+
res = client.ft().spellcheck("impornant")
5708+
5709+
# Before the fix the default-RESP3 case had
5710+
# ``res.get("results", {})`` miss the ``b"results"`` key and
5711+
# return an empty ``{}``. Both protocols carry the term and
5712+
# suggestion through as bytes, matching
5713+
# ``_parse_spellcheck`` with ``decode_responses=False``.
5714+
assert b"impornant" in res
5715+
suggestions = res[b"impornant"]
5716+
assert suggestions
5717+
assert suggestions[0]["suggestion"] == b"important"

0 commit comments

Comments
 (0)