Skip to content

Commit 2397bd1

Browse files
committed
chore: improve search
1 parent c3b9f2c commit 2397bd1

11 files changed

Lines changed: 243 additions & 19 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,4 @@ op*.txt
2121
b.sh
2222
.DS_Store
2323
.claude
24+
*.patch

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Several projects address MCP tool sprawl in different ways: [RAG-MCP](https://gi
3232

3333
- **No infrastructure.** One Go binary, local SQLite. No Docker, no vector DB service, no cloud account.
3434
- **IDE auto-import.** Reads your Claude Desktop, Cursor, or VS Code MCP config. No manual YAML unless you want it.
35-
- **Three modes in one tool.** Search mode (5 meta-tools) for weak models, direct mode (transparent proxy) for strong models, hybrid for both. Switch with a flag.
35+
- **Three modes in one tool.** Direct mode (transparent proxy) for simple setups and smaller models, search mode (5 meta-tools) for large catalogs with strong models, hybrid for both. Switch with a flag.
3636
- **Provider-agnostic.** Not tied to Anthropic, OpenAI, or any specific client. Anything that speaks MCP over stdio or HTTP.
3737
- **Reliability built in.** Circuit breaking, caching, session reuse, and tracing handled at the proxy layer.
3838

@@ -89,11 +89,11 @@ No IDE config? Write a YAML file manually — see [Configuration](#configuration
8989

9090
## When to use which mode
9191

92-
- **Search mode** (default) — the agent sees 5 meta-tools and discovers capabilities through search. Reduces prompt size and improves tool selection for smaller/cheaper models (Haiku, GPT-4.1-mini, local Ollama).
92+
- **Direct mode** — every cataloged tool is exposed by name. The agent sees real schemas, lazy-tool routes transparently. Best for smaller/cheaper models (Haiku, GPT-4.1-mini, local Ollama) that struggle with multi-step reasoning. They get a simple tool list and call tools directly — one step, no search overhead. Also good for strong models that benefit from single-endpoint aggregation, circuit breaking, and caching.
9393

94-
- **Direct mode** — every cataloged tool is exposed by name. The agent sees real schemas, lazy-tool routes transparently. For strong models that handle large tool lists fine but benefit from single-endpoint aggregation, circuit breaking, and caching.
94+
- **Search mode** (default) — the agent sees 5 meta-tools and discovers capabilities through search. Best for strong models (Claude, GPT-4, Llama 70B+) working with large tool catalogs (50+ tools) where dumping every schema into context wastes tokens and degrades selection accuracy. Requires the model to handle a two-step search→invoke pattern.
9595

96-
- **Hybrid mode** — both search and direct tools available. Useful for gradual migration.
96+
- **Hybrid mode** — both search and direct tools available. Useful for gradual migration or mixed workloads.
9797

9898
```bash
9999
lazy-tool serve # search (default)

benchmark/README.md

Lines changed: 62 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,54 @@ Publishing honest benchmark claims is part of the project's reputation.
5252

5353
## Environment
5454

55-
Recommended local environment:
55+
### Prerequisites
5656

57-
- MCPJungle running locally (for baseline mode)
58-
- sample local MCPs registered
59-
- `lazy-tool` built from the repo root
60-
- valid `benchmark/configs/mcpjungle-lazy-tool.yaml`
57+
- Go 1.25+ (to build lazy-tool)
58+
- Node.js / npx (for the `everything` and `filesystem` MCP servers)
59+
- Python 3.11+ (for the benchmark harnesses)
60+
- [uv](https://docs.astral.sh/uv/) (recommended, for `mcp-server-time` via `uvx`)
6161
- At least one of: `GROQ_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`
62+
- For weak-model benchmarks: [Ollama](https://ollama.com) running locally with at least one model pulled
63+
64+
### Setting up MCPJungle
65+
66+
The benchmarks use [MCPJungle](https://github.com/mcpjungle/MCPJungle) as the upstream MCP gateway that hosts the test tools. Baseline mode connects directly to MCPJungle; search and direct modes connect through lazy-tool which indexes MCPJungle's catalog.
67+
68+
**1. Install MCPJungle:**
69+
70+
```bash
71+
# See https://github.com/mcpjungle/MCPJungle for full install instructions
72+
go install github.com/mcpjungle/mcpjungle@latest
73+
```
74+
75+
**2. Start MCPJungle:**
76+
77+
```bash
78+
mcpjungle serve
79+
# Default: http://127.0.0.1:8080/mcp (strong model suite)
80+
# Or configure a different port and pass --jungle-url to the benchmark scripts
81+
```
82+
83+
**3. Register the sample MCP servers:**
84+
85+
```bash
86+
./benchmark/mcpjungle-dev/register-samples.sh
87+
```
88+
89+
This registers three MCP servers into MCPJungle:
90+
91+
| Server | Transport | What it provides | Requires |
92+
|--------|-----------|-----------------|----------|
93+
| `everything` | stdio | echo tool, prompts, resources (MCP reference server) | npx |
94+
| `filesystem` | stdio | read/write/list tools scoped to `/tmp/lazy-tool-mcpjungle-fs` | npx |
95+
| `time` | stdio | time conversion tools | uvx |
96+
97+
**4. Verify tools are registered:**
98+
99+
```bash
100+
mcpjungle list tools
101+
# Should show tools from everything, filesystem, and time servers
102+
```
62103

63104
### Python dependencies
64105

@@ -72,6 +113,14 @@ uv pip install --python benchmark/.venv/bin/python -r benchmark/requirements.txt
72113
pip install -r benchmark/requirements.txt
73114
```
74115

116+
### Weak-model setup (Ollama)
117+
118+
```bash
119+
# Install Ollama: https://ollama.com
120+
ollama serve # start the server
121+
ollama pull qwen2.5:3b # pull at least one model
122+
```
123+
75124
## Quick reproducible flow
76125

77126
### 1. Build and reindex
@@ -326,10 +375,18 @@ Keep raw artifacts around when updating public benchmark claims.
326375
### `search_tools_smoke` returns zero hits
327376

328377
Usually:
378+
- MCPJungle is not running or sample MCPs are not registered (see [Setting up MCPJungle](#setting-up-mcpjungle))
329379
- you forgot `reindex`
330380
- your source config is wrong
331381
- the indexed catalog is stale or empty
332382

383+
Verify with:
384+
```bash
385+
export LAZY_TOOL_CONFIG=$PWD/benchmark/configs/mcpjungle-lazy-tool.yaml
386+
./bin/lazy-tool reindex
387+
./bin/lazy-tool search "echo" --limit 5
388+
```
389+
333390
### routed task chooses the wrong wrapper
334391

335392
This is usually:

benchmark/run_weak_model_suite.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ LAZY_CONFIG=""
2828
JUNGLE_URL="http://127.0.0.1:8080/mcp"
2929
OLLAMA_URL="http://localhost:11434"
3030
SKIP_BUILD="false"
31+
SKIP_PREFLIGHT="false"
3132
MODELS=""
3233
TIER=""
3334

@@ -40,6 +41,7 @@ while [[ $# -gt 0 ]]; do
4041
--jungle-url) JUNGLE_URL="${2:?missing value}"; shift 2 ;;
4142
--ollama-url) OLLAMA_URL="${2:?missing value}"; shift 2 ;;
4243
--skip-build) SKIP_BUILD="true"; shift ;;
44+
--skip-preflight) SKIP_PREFLIGHT="true"; shift ;;
4345
--models) MODELS="${2:?missing value}"; shift 2 ;;
4446
--tier) TIER="${2:?missing value}"; shift 2 ;;
4547
*)
@@ -125,6 +127,40 @@ LAZY_TOOL_CONFIG="$LAZY_CONFIG" "$LAZY_BINARY" reindex 2>&1 || {
125127
exit 1
126128
}
127129

130+
# ── Preflight catalog check ──────────────────────────────────────────────
131+
# Verify the catalog has the expected tools before running benchmarks.
132+
# Without this, a broken MCPJungle setup silently produces meaningless results.
133+
134+
if [[ "$SKIP_PREFLIGHT" == "true" ]]; then
135+
echo "Preflight: skipped (--skip-preflight)"
136+
else
137+
138+
echo "Preflight: verifying catalog..."
139+
PREFLIGHT_FAIL=""
140+
for query in "echo" "time"; do
141+
HITS=$(LAZY_TOOL_CONFIG="$LAZY_CONFIG" "$LAZY_BINARY" search "$query" --limit 3 2>/dev/null \
142+
| "$PYTHON" -c "import sys,json; d=json.load(sys.stdin); print(len(d.get('results',[])))" 2>/dev/null || echo "0")
143+
if [[ "$HITS" == "0" ]]; then
144+
PREFLIGHT_FAIL="${PREFLIGHT_FAIL} - search '$query' returned 0 results\n"
145+
else
146+
echo " search '$query': $HITS hit(s) — ok"
147+
fi
148+
done
149+
150+
if [[ -n "$PREFLIGHT_FAIL" ]]; then
151+
echo "" >&2
152+
echo "ERROR: Preflight catalog check failed:" >&2
153+
echo -e "$PREFLIGHT_FAIL" >&2
154+
echo "The catalog does not contain expected tools." >&2
155+
echo "Check that MCPJungle is running and sample MCPs are registered:" >&2
156+
echo " benchmark/mcpjungle-dev/register-samples.sh" >&2
157+
echo "Then re-run: LAZY_TOOL_CONFIG=$LAZY_CONFIG $LAZY_BINARY reindex" >&2
158+
exit 1
159+
fi
160+
echo "Preflight passed."
161+
162+
fi # end skip-preflight guard
163+
128164
# ── Prepare filesystem fixture ───────────────────────────────────────────
129165

130166
FS_ROOT="/tmp/lazy-tool-mcpjungle-fs"

internal/search/candidate_path_test.go

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ func TestSearch_candidatePath_substringMatrix(t *testing.T) {
2121
}
2222
rows := []row{
2323
{
24-
name: "fts_hit_skips_full_substring_scan",
24+
name: "fts_sparse_augments_with_substring",
2525
query: "create github issue",
26-
want: models.SearchCandidatePathSubstringSkippedFTSHit,
26+
want: models.SearchCandidatePathSubstringAugmentedFTSSparse,
2727
fixture: models.CapabilityRecord{
2828
ID: "1", Kind: models.CapabilityKindTool, SourceID: "github-gateway", SourceType: "gateway",
2929
CanonicalName: "github_gateway__create_issue", OriginalName: "create_issue",
@@ -57,6 +57,46 @@ func TestSearch_candidatePath_substringMatrix(t *testing.T) {
5757
},
5858
}
5959

60+
// When limit=1 and FTS returns 1 hit, substring scan is skipped (FTS has enough).
61+
t.Run("fts_sufficient_skips_substring", func(t *testing.T) {
62+
var mode string
63+
prev := metrics.SearchCandidateGeneration
64+
metrics.SearchCandidateGeneration = func(m string) { mode = m }
65+
defer func() { metrics.SearchCandidateGeneration = prev }()
66+
67+
p := filepath.Join(t.TempDir(), "substr.db")
68+
st, err := storage.OpenSQLite(p)
69+
if err != nil {
70+
t.Fatal(err)
71+
}
72+
defer func() { _ = st.Close() }()
73+
ctx := context.Background()
74+
rec := models.CapabilityRecord{
75+
ID: "s1", Kind: models.CapabilityKindTool, SourceID: "github-gateway", SourceType: "gateway",
76+
CanonicalName: "github_gateway__create_issue", OriginalName: "create_issue",
77+
OriginalDescription: "Create an issue in a repo",
78+
GeneratedSummary: "Creates GitHub issues with title and body.",
79+
SearchText: "github-gateway create_issue repo title body issue",
80+
VersionHash: "h1", LastSeenAt: time.Now(),
81+
InputSchemaJSON: "{}", MetadataJSON: "{}",
82+
}
83+
if err := st.UpsertCapability(ctx, rec); err != nil {
84+
t.Fatal(err)
85+
}
86+
svc := NewService(st, nil, embeddings.Noop{}, ScoreWeights{}, false)
87+
// limit=1: FTS returns 1 hit which is >= limit, so substring is skipped
88+
ranked, err := svc.Search(ctx, models.SearchQuery{Text: "create github issue", Limit: 1})
89+
if err != nil {
90+
t.Fatal(err)
91+
}
92+
if mode != models.SearchCandidatePathSubstringSkippedFTSHit {
93+
t.Fatalf("metrics path: got %q want %q", mode, models.SearchCandidatePathSubstringSkippedFTSHit)
94+
}
95+
if ranked.CandidatePath != models.SearchCandidatePathSubstringSkippedFTSHit {
96+
t.Fatalf("CandidatePath: got %q want %q", ranked.CandidatePath, models.SearchCandidatePathSubstringSkippedFTSHit)
97+
}
98+
})
99+
60100
for _, tc := range rows {
61101
t.Run(tc.name, func(t *testing.T) {
62102
var mode string

internal/search/e2e_test.go

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package search
33
import (
44
"context"
55
"path/filepath"
6+
"strings"
67
"testing"
78
"time"
89

@@ -208,6 +209,64 @@ func TestService_Search_userSummaryBoost(t *testing.T) {
208209
}
209210
}
210211

212+
func TestService_Search_userSummaryContentMatchesLexical(t *testing.T) {
213+
p := filepath.Join(t.TempDir(), "s.db")
214+
st, err := storage.OpenSQLite(p)
215+
if err != nil {
216+
t.Fatal(err)
217+
}
218+
defer func() { _ = st.Close() }()
219+
ctx := context.Background()
220+
221+
// Two tools with identical generated summaries. Only "b" has a user summary
222+
// containing the search term "email". The lexical scorer should rank "b"
223+
// higher because its effective summary matches the query.
224+
a := models.CapabilityRecord{
225+
ID: "a", Kind: models.CapabilityKindTool, SourceID: "s", SourceType: "gateway",
226+
CanonicalName: "s__a", OriginalName: "a_tool",
227+
GeneratedSummary: "generic helper utility",
228+
SearchText: "s a_tool generic helper utility email", VersionHash: "1", LastSeenAt: time.Now(),
229+
InputSchemaJSON: "{}", MetadataJSON: "{}",
230+
}
231+
b := models.CapabilityRecord{
232+
ID: "b", Kind: models.CapabilityKindTool, SourceID: "s", SourceType: "gateway",
233+
CanonicalName: "s__b", OriginalName: "b_tool",
234+
GeneratedSummary: "generic helper utility",
235+
UserSummary: "sends email notifications to users",
236+
SearchText: "s b_tool generic helper utility sends email notifications", VersionHash: "2", LastSeenAt: time.Now(),
237+
InputSchemaJSON: "{}", MetadataJSON: "{}",
238+
}
239+
if err := st.UpsertCapability(ctx, a); err != nil {
240+
t.Fatal(err)
241+
}
242+
if err := st.UpsertCapability(ctx, b); err != nil {
243+
t.Fatal(err)
244+
}
245+
svc := NewService(st, nil, embeddings.Noop{}, DefaultScoreWeights(), false)
246+
out, err := svc.Search(ctx, models.SearchQuery{Text: "email", Limit: 5})
247+
if err != nil {
248+
t.Fatal(err)
249+
}
250+
if len(out.Results) < 1 {
251+
t.Fatal("expected at least 1 result")
252+
}
253+
// "b" should rank first because its effective summary (user summary) contains "email"
254+
if out.Results[0].CapabilityID != b.ID {
255+
t.Fatalf("user summary content should boost relevance; want b first, got %+v", out.Results)
256+
}
257+
// Verify the summary match signal is present
258+
found := false
259+
for _, w := range out.Results[0].WhyMatched {
260+
if strings.Contains(w, "summary:") {
261+
found = true
262+
break
263+
}
264+
}
265+
if !found {
266+
t.Fatalf("expected summary: signal in why_matched: %v", out.Results[0].WhyMatched)
267+
}
268+
}
269+
211270
func TestService_Search_noopEmbeddingsNoPanic(t *testing.T) {
212271
p := filepath.Join(t.TempDir(), "s.db")
213272
st, err := storage.OpenSQLite(p)

internal/search/scoring.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ func scoreLexical(needle string, tokens []string, rec *models.CapabilityRecord)
2020
}
2121

2222
on := strings.ToLower(rec.OriginalName)
23-
sum := strings.ToLower(rec.GeneratedSummary)
23+
sum := strings.ToLower(rec.EffectiveSummary())
2424
src := strings.ToLower(rec.SourceID)
2525

2626
if needle != "" {

internal/search/service.go

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -376,9 +376,10 @@ func (s *Service) buildCandidates(ctx context.Context, q models.SearchQuery, nee
376376
}
377377
}
378378

379-
// Substring scan over the full catalog: only when FTS did not already return hits.
380-
// If BM25 returned candidates, repeating a per-row substring pass is redundant for normal queries.
381-
if match != "" && len(ftsIDs) > 0 {
379+
// Substring scan over the full catalog: skip only when FTS returned enough candidates
380+
// to fill the request. When FTS returns sparse results (fewer than limit), augment with
381+
// substring scan so that near-matches are not lost to BM25 ranking gaps.
382+
if match != "" && len(ftsIDs) >= q.Limit {
382383
metrics.SearchCandidateGeneration(models.SearchCandidatePathSubstringSkippedFTSHit)
383384
return out, models.SearchCandidatePathSubstringSkippedFTSHit, nil
384385
}
@@ -388,9 +389,14 @@ func (s *Service) buildCandidates(ctx context.Context, q models.SearchQuery, nee
388389
return out, models.SearchCandidatePathFullCatalogSubstringDisabled, nil
389390
}
390391

391-
subPath := models.SearchCandidatePathSubstringFullCatalogFTSZeroRows
392-
if match == "" {
392+
var subPath string
393+
switch {
394+
case match == "":
393395
subPath = models.SearchCandidatePathSubstringFullCatalogNoFTSMatch
396+
case len(ftsIDs) == 0:
397+
subPath = models.SearchCandidatePathSubstringFullCatalogFTSZeroRows
398+
default:
399+
subPath = models.SearchCandidatePathSubstringAugmentedFTSSparse
394400
}
395401
metrics.SearchCandidateGeneration(subPath)
396402
subIDs, err := s.Store.ListIDsBySearchTextSubstring(ctx, needle, q.SourceIDs)

internal/storage/fts.go

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,12 @@ func tagsJoined(rec models.CapabilityRecord) string {
4040
return strings.Join(rec.Tags, " ")
4141
}
4242

43-
// ftsTokenize splits a query into FTS-safe tokens (letters/digits runs, min length 2). Aligns with search tokenization.
43+
// ftsTokenize splits a query into FTS-safe tokens (letters/digits runs, min length 2).
44+
// Single-char tokens are dropped because they produce excessive FTS matches across the
45+
// entire catalog without adding discriminative value. The FTS5 porter unicode61 tokenizer
46+
// does index single-char tokens, but querying on them returns too many false positives.
47+
// When ftsTokenize returns no tokens (e.g. single-letter query), BuildFTSMatchQuery returns
48+
// "" and the search pipeline falls back to substring scan, which handles short queries fine.
4449
func ftsTokenize(s string) []string {
4550
s = strings.ToLower(s)
4651
var cur strings.Builder

internal/storage/fts_test.go

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,22 @@ func TestGetCapabilitiesByIDs(t *testing.T) {
170170
t.Fatalf("%+v", m)
171171
}
172172
}
173+
174+
func TestFTS_singleCharQueryReturnsEmptyMatch(t *testing.T) {
175+
// Single-char queries produce empty FTS MATCH strings by design.
176+
// The search pipeline falls back to substring scan for these.
177+
match := BuildFTSMatchQuery("a")
178+
if match != "" {
179+
t.Fatalf("single-char query should produce empty match, got %q", match)
180+
}
181+
// Two-char tokens should work normally.
182+
match = BuildFTSMatchQuery("ab")
183+
if match == "" {
184+
t.Fatal("two-char query should produce non-empty match")
185+
}
186+
// Mixed: only 2+ char tokens survive.
187+
match = BuildFTSMatchQuery("a bc d ef")
188+
if match != `"bc" AND "ef"` {
189+
t.Fatalf("want only 2+ char tokens, got %q", match)
190+
}
191+
}

0 commit comments

Comments
 (0)