Skip to content

feat: Zero-VDB GitHub-native + BM25 search backend#121

Merged
Sachindu-Nethmin merged 13 commits into
mainfrom
feature/zero-vdb-github-native-search
May 9, 2026
Merged

feat: Zero-VDB GitHub-native + BM25 search backend#121
Sachindu-Nethmin merged 13 commits into
mainfrom
feature/zero-vdb-github-native-search

Conversation

@Kavirubc
Copy link
Copy Markdown
Contributor

@Kavirubc Kavirubc commented May 9, 2026

Summary

Closes #120. Implements the zero-infrastructure search backend discussed in Discussion #112.

  • Replaces Qdrant with GitHub's hybrid search API (/search/issues?search_type=hybrid) + in-process BM25 Okapi re-ranking
  • No external VDB, no embedding API key — only GITHUB_TOKEN (automatic in Actions) is required
  • Qdrant backend preserved and fully backward-compatible via search.backend: qdrant

What changed

Commit Change
feat(config) SearchConfig struct with backend/bm25_fallback; Validate() skips Qdrant checks for non-qdrant backends
feat(github) Searcher wrapping /search/issues?search_type=hybrid with rate-limit detection
feat(pipeline) GitHubSearcher in Dependencies; new issue-triage-github and similarity-only-github presets
feat(steps/bm25) Inline BM25 Okapi scorer (~100 lines, zero new dependency)
feat(steps/github_similarity) Two-tier step: GitHub hybrid search → BM25 re-rank; BM25-over-ListIssues fallback on rate-limit
feat(cmd) Conditional embedder init; GitHubSearcher wiring in process.go and batch.go
fix(cmd) --workflow flag no longer hardcodes "issue-triage", respects workflow: from config file
fix(search) PR events use is:pr filter; issue events use is:issue
chore(config) Repo configs switched to github_native; QDRANT_* secrets removed from triage.yml

How it works

[New issue / PR arrives]
        │
        ▼
GitHub Hybrid Search API          ← tier 1, uses GITHUB_TOKEN
/search/issues?search_type=hybrid
is:issue (or is:pr for PR events)
        │  ordered candidates, score always = 1
        ▼
BM25 Okapi re-ranking (in-process) ← tier 2, pure Go, no deps
Score candidates against title+body
        │  normalized [0,1] similarity scores
        ▼
LLM Duplicate Verdict              ← unchanged, optional
ai.LLMClient.DetectDuplicate()

If GitHub search returns 0 results or hits the 10 req/min rate-limit, tier 2 runs BM25 directly over all open issues/PRs fetched from the list API (capped at 500).

Config to use the new backend

search:
  backend: github_native
  bm25_fallback: true

workflow: issue-triage-github

llm:
  provider: gemini
  api_key: "${GEMINI_API_KEY}"   # optional — enables LLM verdict

defaults:
  similarity_threshold: 0.15    # BM25 scores; lower than cosine similarity
  max_similar_to_show: 3

Test plan

  • go build ./... passes
  • go test ./... passes (all existing + new tests)
  • go vet ./... passes
  • Live dry-run against similigh/simili-bot with github_native backend confirmed github_similarity step fires
  • Live run against issue [0.2.0v][Feature] PR indexing with dedicated collection and pr-duplicate CLI #43 — pipeline completed, comment posted, no Qdrant/embedding key used
  • PR event type correctly uses is:pr filter
  • Existing Qdrant configs remain valid (backward-compatible default)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added GitHub-native search backend with BM25 ranking for similarity detection.
    • Introduced new workflow presets optimized for GitHub-based issue triage.
  • Configuration

    • Simplified setup by removing Qdrant vector database requirement.
    • Made Gemini API key optional for LLM duplicate verdict.
    • Adjusted default similarity threshold to reflect new scoring semantics.
  • Documentation

    • Updated configuration examples for single-repo and multi-repo deployments.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Warning

Rate limit exceeded

@Kavirubc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 28 minutes and 51 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13b01e4f-431e-4c71-80a0-a776a0a4990f

📥 Commits

Reviewing files that changed from the base of the PR and between d71844a and 6dbc52b.

📒 Files selected for processing (17)
  • .github/simili.yaml
  • .github/workflows/triage.yml
  • .simili.yaml
  • DOCS/examples/multi-repo/simili.yaml
  • DOCS/examples/single-repo/simili.yaml
  • cmd/simili/commands/batch.go
  • cmd/simili/commands/process.go
  • internal/core/config/config.go
  • internal/core/config/config_test.go
  • internal/core/pipeline/registry.go
  • internal/integrations/github/searcher.go
  • internal/integrations/github/searcher_test.go
  • internal/steps/bm25.go
  • internal/steps/bm25_test.go
  • internal/steps/github_similarity.go
  • internal/steps/github_similarity_test.go
  • internal/steps/register.go
📝 Walkthrough

Walkthrough

This PR implements a zero-infrastructure native search backend for issue/PR similarity detection, replacing mandatory Qdrant and embedding API dependencies with GitHub's hybrid search API plus local BM25 re-ranking. All operations use only the standard GitHub Token, eliminating external vector database and embedding service requirements.

Changes

GitHub Native Search Backend with BM25 Fallback

Layer / File(s) Summary
Search Configuration Schema
internal/core/config/config.go, internal/core/config/config_test.go
New SearchConfig struct with Backend ("qdrant"|"github_native"|"bm25") and BM25Fallback fields. Validation now conditionally enforces Qdrant/embedding requirements only when backend is unset or "qdrant". Defaults set backend to "qdrant" and BM25Fallback to true. Config merging allows child overrides.
GitHub Hybrid Search Client
internal/integrations/github/searcher.go, internal/integrations/github/searcher_test.go
New Searcher wrapper around go-github client. SearchHit struct normalizes GitHub search results. SearchIssues queries /search/issues?search_type=hybrid, filters by itemType, clamps limit to [1..100], and detects rate-limiting (403 with X-Ratelimit-Remaining=0) as fallback trigger returning rateLimited=true, err=nil.
BM25 Re-ranking Scorer
internal/steps/bm25.go, internal/steps/bm25_test.go
Okapi BM25 implementation: tokenizes documents/query (lowercase, alphanumeric, min-2-char), computes per-document term frequencies and BM25 scores with K1=2.0, B=0.75, normalizes to [0,1] range. Handles empty inputs and zero max-score gracefully.
GitHub Similarity Detection Step
internal/steps/github_similarity.go, internal/steps/github_similarity_test.go
Two-tier orchestration: Tier 1 queries GitHub hybrid search API; Tier 2 falls back to paginated ListIssues on rate-limit/error/empty results. Local BM25 re-ranks candidates, filters current item, applies threshold, truncates to MaxSimilarToShow, populates ctx.SimilarIssues. Skips on comment events, transfer-detected metadata, or DRY-run.
Pipeline Wiring
internal/core/pipeline/registry.go, internal/steps/register.go
Dependencies struct gains GitHubSearcher field (non-nil for github_native/bm25 backends). Two new workflow presets issue-triage-github and similarity-only-github using github_similarity step. Registry registers "github_similarity" step via NewGitHubSimilarity(deps).
Command Initialization
cmd/simili/commands/batch.go, cmd/simili/commands/process.go
Embedder initialization gated on backend=unset|"qdrant"; GitHubSearcher initialization gated on backend="github_native"|"bm25". process.go changes --workflow flag default to empty for runtime resolution: CLI flag takes precedence, falls back to cfg.Workflow, then resolves steps.
Configuration Files
.github/simili.yaml, .github/workflows/triage.yml, .simili.yaml, DOCS/examples/*
All files converted from Qdrant-centric to GitHub-native: remove qdrant and embedding sections; add search.backend: github_native and search.bm25_fallback: true; lower defaults.similarity_threshold from 0.70/0.65 to 0.15 (BM25 semantics); set workflow: issue-triage-github explicitly. Workflow removes Qdrant env vars. Examples include commented legacy Qdrant sections for opt-in backward compatibility.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • similigh/simili-bot#114: Modifies pipeline/config and startup logic around search backend defaults and conditional Qdrant/embedder initialization.
  • similigh/simili-bot#95: Updates integrations/dependency surface in cmd/simili/commands and pipeline/registry.go with changes to embedder/LLM initialization patterns.

Suggested labels

enhancement, feature

Poem

🐰 Hops through GitHub's hybrid lanes,
No vectors locked in Qdrant chains,
BM25 scores in memory bloom,
Zero infra, no extra room!
Token-only freedom—huzzah!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Zero-VDB GitHub-native + BM25 search backend' directly summarizes the main change: replacing the Qdrant vector database dependency with a GitHub-native hybrid search API and BM25 re-ranking backend.
Linked Issues check ✅ Passed The PR implements all primary objectives from issue #120: GitHub hybrid search integration [searcher.go], in-process BM25 re-ranker [bm25.go], conditional embedder initialization [batch.go, process.go], SearchConfig with optional Qdrant validation [config.go], new GitHubSearcher pipeline dependency [registry.go], github_similarity step [github_similarity.go], and updated configs/workflows to remove Qdrant env vars.
Out of Scope Changes check ✅ Passed All changes are directly in scope: config schema extensions for SearchConfig, GitHub integration layer, BM25 implementation, new pipeline steps, CLI flag behavior fix for --workflow, and dependency initialization refactoring. No unrelated or cosmetic changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/zero-vdb-github-native-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Kavirubc and others added 11 commits May 9, 2026 14:15
Add SearchConfig struct with Backend ("qdrant"|"github_native"|"bm25")
and BM25Fallback fields. Default backend is "qdrant" for backward compat.
Validate() now skips Qdrant/embedding key checks for non-qdrant backends.
mergeConfigs() propagates Search fields from child config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Searcher wraps /search/issues?search_type=hybrid using the go-github
client for oauth2 auth. Returns SearchHit slice and a rateLimited bool
so callers can fall back gracefully without treating rate-limits as errors.
searchIssuesRaw is an unexported helper used by unit tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Add GitHubSearcher *github.Searcher to Dependencies for injection into
github_similarity step. Add issue-triage-github and similarity-only-github
presets that replace the Qdrant steps with github_similarity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Pure-function BM25 Okapi implementation (~100 lines, no new dependency).
tokenize() lowercases, strips punctuation, and filters tokens < 2 chars.
bm25Score() returns [0,1] normalized scores with k1=1.5 b=0.75.
Used by github_similarity for re-ranking GitHub search candidates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
GitHubSimilarity uses GitHub hybrid search as tier 1 and BM25 over
ListIssues as tier 2 fallback (rate-limit or empty results). BM25 corpus
is capped at 500 issues. Results are normalized [0,1] and filtered by
similarity_threshold before populating ctx.SimilarIssues.

Step is registered as "github_similarity" and used by the new
issue-triage-github and similarity-only-github presets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
process.go and batch.go: skip embedder initialization when
search.backend != "qdrant" (prevents hard failure for users without
an embedding API key). Initialize GitHubSearcher when backend is
"github_native" or "bm25".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
The --workflow flag defaulted to "issue-triage" hardcoded, always
overriding whatever workflow: was set in the config file. Change the
default to "" so process.go falls through to cfg.Workflow, letting
users set workflow: issue-triage-github in their simili.yaml without
needing a CLI flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Update all config files and the triage workflow to use the new
github_native search backend (workflow: issue-triage-github).

Changes:
- .github/simili.yaml: remove qdrant/embedding blocks, add search config
- .simili.yaml: same for local dev
- .github/workflows/triage.yml: remove QDRANT_URL and QDRANT_API_KEY secrets
- DOCS/examples/*/simili.yaml: update both examples; keep qdrant as commented legacy block

similarity_threshold lowered to 0.15 (appropriate for BM25 vs 0.70 for cosine).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
SearchIssues now accepts an itemType param ("issue"|"pr"|"") so the
GitHub hybrid search query uses is:issue or is:pr appropriately.
fetchAllIssues filters the ListIssues result to match itemType.
github_similarity sets itemType from ctx.Issue.EventType so PR events
find similar PRs and issue events find similar issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Signed-off-by: Kaviru Hapuarachchi <kavirurh@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Satisfies errcheck linter requirement.

Signed-off-by: Kaviru Hapuarachchi <kavirurh@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
@Kavirubc Kavirubc force-pushed the feature/zero-vdb-github-native-search branch from 9f722af to 7b425b9 Compare May 9, 2026 08:46
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/integrations/github/searcher_test.go (1)

70-70: 💤 Low value

Check the error return from w.Write.

While this is in test code and unlikely to fail, checking the error return value is good practice and satisfies the linter.

✨ Proposed fix
-		w.Write([]byte(`{"message":"API rate limit exceeded"}`))
+		_, _ = w.Write([]byte(`{"message":"API rate limit exceeded"}`))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/integrations/github/searcher_test.go` at line 70, The test currently
calls w.Write([]byte(`{"message":"API rate limit exceeded"}`)) without checking
the returned error; change this to capture and assert the error from w.Write
(e.g. _, err := w.Write(...); require.NoError(t, err) or if not using testify
then if err != nil { t.Fatalf("w.Write failed: %v", err) }) in
internal/integrations/github/searcher_test.go to satisfy the linter and ensure
write failures are surfaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/simili/commands/batch.go`:
- Around line 337-343: The config Validate() function should enforce that
Search.Backend is one of the allowed values; add a whitelist check inside
Validate() for Search.Backend (valid values: "qdrant", "github_native", "bm25")
and return a clear error when it is not valid; specifically update
internal/core/config/config.go in the Validate() method to read
cfg.Search.Backend and if it is not one of those three strings return an error
like "invalid Search.Backend: %q; must be one of: qdrant, github_native, bm25"
so downstream code (e.g., the GitHub searcher initialization) cannot proceed
with a misspelled backend.

In `@internal/integrations/github/searcher.go`:
- Around line 98-101: The type inference in the searcher uses item.PullRequest
!= nil to set t to "pr" but the test helper searchIssuesRaw additionally checks
strings.TrimSpace(item.PullRequest.URL) != "" causing inconsistency; make them
consistent by removing the extra URL non-empty check from searchIssuesRaw so
both rely on item.PullRequest != nil (refer to the item.PullRequest check and
the searchIssuesRaw helper) and update tests accordingly.

In `@internal/steps/github_similarity.go`:
- Around line 192-227: The loop appends hits to the slice all and only checks
len(all) >= bm25CorpusCap after appending (and already breaks when resp.NextPage
== 0), which allows all to exceed bm25CorpusCap and leaves the final if-block
dead; change the logic in the issues iteration (the for _, iss := range issues
loop) to check whether len(all) >= bm25CorpusCap before appending each
githubpkg.SearchHit (or compute remaining capacity and only append up to that
capacity), remove the unreachable trailing if block that logs and breaks (the
duplicate len(all) >= bm25CorpusCap block), and ensure pagination
(resp.NextPage) still breaks normally; reference variables/functions: all,
bm25CorpusCap, issues loop, iss, resp.NextPage, and githubpkg.SearchHit.

---

Nitpick comments:
In `@internal/integrations/github/searcher_test.go`:
- Line 70: The test currently calls w.Write([]byte(`{"message":"API rate limit
exceeded"}`)) without checking the returned error; change this to capture and
assert the error from w.Write (e.g. _, err := w.Write(...); require.NoError(t,
err) or if not using testify then if err != nil { t.Fatalf("w.Write failed: %v",
err) }) in internal/integrations/github/searcher_test.go to satisfy the linter
and ensure write failures are surfaced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a1edd0ef-b0bb-45f4-95f2-5988c42e37db

📥 Commits

Reviewing files that changed from the base of the PR and between 9f4a622 and d71844a.

📒 Files selected for processing (17)
  • .github/simili.yaml
  • .github/workflows/triage.yml
  • .simili.yaml
  • DOCS/examples/multi-repo/simili.yaml
  • DOCS/examples/single-repo/simili.yaml
  • cmd/simili/commands/batch.go
  • cmd/simili/commands/process.go
  • internal/core/config/config.go
  • internal/core/config/config_test.go
  • internal/core/pipeline/registry.go
  • internal/integrations/github/searcher.go
  • internal/integrations/github/searcher_test.go
  • internal/steps/bm25.go
  • internal/steps/bm25_test.go
  • internal/steps/github_similarity.go
  • internal/steps/github_similarity_test.go
  • internal/steps/register.go

Comment thread cmd/simili/commands/batch.go
Comment thread internal/integrations/github/searcher.go
Comment thread internal/steps/github_similarity.go
I, Kavirubc <hapuarachchikaviru@gmail.com>, hereby add my Signed-off-by to this commit: f11ee1b

Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a zero-infrastructure similarity-search path that uses GitHub’s hybrid search API for candidate retrieval and an in-process BM25 scorer for re-ranking, making Qdrant/embedding optional while preserving backward compatibility via search.backend: qdrant.

Changes:

  • Introduces a GitHub-native similarity step (github_similarity) backed by /search/issues?search_type=hybrid plus BM25 re-ranking (with ListIssues-based fallback).
  • Adds search.backend / search.bm25_fallback configuration, updates presets, and adjusts CLI wiring to initialize dependencies conditionally.
  • Updates repository and documentation configs/workflows to default to the GitHub-native backend and removes Qdrant secrets from Actions.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
internal/steps/register.go Registers the new github_similarity step.
internal/steps/github_similarity.go Implements GitHub hybrid search → BM25 re-rank, with ListIssues BM25 fallback.
internal/steps/github_similarity_test.go Adds basic tests for skip/dry-run behavior.
internal/steps/bm25.go Adds a small in-process BM25 tokenizer/scorer.
internal/steps/bm25_test.go Adds unit tests for tokenization and BM25 scoring behavior.
internal/integrations/github/searcher.go Adds a Searcher wrapper for GitHub hybrid search and rate-limit detection.
internal/integrations/github/searcher_test.go Adds tests for JSON decoding and a rate-limit-related check.
internal/core/pipeline/registry.go Extends dependencies and adds new GitHub-native workflow presets.
internal/core/config/config.go Adds SearchConfig, defaulting to qdrant; validation skips Qdrant requirements for non-qdrant backends.
internal/core/config/config_test.go Adds tests for search defaults, YAML parsing, and validation behavior.
cmd/simili/commands/process.go Respects workflow from config when flag unset; initializes GitHub searcher; gates embedder init by backend.
cmd/simili/commands/batch.go Initializes GitHub searcher; gates embedder init by backend.
DOCS/examples/single-repo/simili.yaml Updates example config to use github_native backend + new preset.
DOCS/examples/multi-repo/simili.yaml Updates example config to use github_native backend + new preset.
.simili.yaml Updates local dev config to use github_native.
.github/workflows/triage.yml Removes Qdrant secrets for Actions workflow; documents optional LLM key.
.github/simili.yaml Updates Actions config to use github_native backend + new preset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/steps/github_similarity.go Outdated
Comment thread internal/steps/github_similarity.go Outdated
Comment thread internal/steps/bm25.go Outdated
Comment on lines +74 to +78
// Build a Searcher pointing at the test server via a plain http.Client.
// We test rate-limit detection using the raw helper since NewSearcher
// requires oauth2; the HTTP-level test uses a real server to verify header parsing.
req, _ := http.NewRequest(http.MethodGet, srv.URL+"/search/issues", nil)
resp, err := http.DefaultClient.Do(req)
Comment on lines +46 to +51
func TestGitHubSimilarity_SkipOnTransferDetected(t *testing.T) {
called := false
step := &GitHubSimilarity{
// Use nil searcher; the transfer skip should fire first.
searcher: nil,
}
Comment on lines 280 to +283
func (c *Config) Validate() error {
requiredFields := []struct {
name string
envVar string
value string
}{
{name: "qdrant.url", envVar: "QDRANT_URL", value: c.Qdrant.URL},
{name: "qdrant.api_key", envVar: "QDRANT_API_KEY", value: c.Qdrant.APIKey},
{name: "qdrant.collection", envVar: "QDRANT_COLLECTION", value: c.Qdrant.Collection},
{name: "embedding.api_key", envVar: "EMBEDDING_API_KEY", value: c.Embedding.APIKey},
}

for _, field := range requiredFields {
if strings.TrimSpace(field.value) == "" {
return fmt.Errorf(
"config validation failed: %s is empty (check %s environment variable)",
field.name,
field.envVar,
)
// Qdrant and embedding API key are only required when using the qdrant backend.
if c.Search.Backend == "" || c.Search.Backend == "qdrant" {
required := []struct {
Comment thread cmd/simili/commands/process.go Outdated
Comment thread cmd/simili/commands/batch.go Outdated
- github_similarity: respect search.bm25_fallback config; backend=bm25
  skips GitHub hybrid search and goes straight to ListIssues+BM25
- github_similarity: fix BM25 corpus cap — check before append, remove
  dead code block that could never be reached
- searcher: unify PR detection to item.PullRequest != nil in both
  SearchIssues and searchIssuesRaw (remove inconsistent URL check)
- searcher: remove unused strings import
- bm25: fix doc comment (returns zero-filled slice, not nil)
- config: validate Search.Backend against allowed values
- config: add TestValidateRejectsUnknownBackend
- process/batch: initialize embedder when embedding API key is present
  regardless of search backend

Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Sachindu-Nethmin
Copy link
Copy Markdown
Contributor

I have test it using my fork repo and it is works fine. So this PR is LGTM

@Sachindu-Nethmin Sachindu-Nethmin merged commit 8e4d2b5 into main May 9, 2026
12 checks passed
@Sachindu-Nethmin Sachindu-Nethmin deleted the feature/zero-vdb-github-native-search branch May 9, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Zero-VDB native search backend (GitHub Hybrid Search + BM25 fallback)

3 participants