Unexpected spaces in snippet around every character #35
Description
Description
A web page containing QUnit.test('add',
shows up in search result snippets as QUnit . test ( ' add ' , assert
. Take note of the unexpected spaces around virtually every symbol. I believe this is most likely a side-effect of the characters in question having <span>
in the source code. However, there are no spaces in the source code around (most) of these characters.
Steps to reproduce
<code><span class="nx">QUnit</span><span class="p">.</span><span class="nx">test</span><span class="p">(</span><span class="dl">'</span><span class="s1">add</span><span class="dl">'</span><span class="p">,</span> <span class="nx">assert</span> <span class="o">=></span> <span class="p">{</span></code>
I'm evaluating Typesense for use on https://api.jquery.com, https://qunitjs.com and other OpenJS sites. I've used typesense/docsearch-scraper
via GitHub Actions, and docsearch is configured with "text": "p,li,tr,pre"
among the selectors. The above code is part of a regular paragraph of PRE tag.
source: typense.yaml
source: /docsearch.config.json)
Expected Behavior
For inline elements like <span>
, <em>
, <code>
, <strong>
to not result in additional spaces to be injected into the indexed text. It is not uncommon for prose to sometimes emphasize, underline, strike, superscript, or otherwise wrap only part of a word in markup for any reason. It is probably most common in content with syntax-highlighted source code.
Metadata
Typesense Version: 0.24.1
OS: Debian 11 Bullseye