AI Search: add suggested topics and keyword suggestions by Marigold · Pull Request #6180 · owid/owid-grapher

Marigold · 2026-03-04T09:07:47Z

Summary

Add "Browse topic" pills that suggest related topics below the search bar using semantic search via Cloudflare Vectorize, with LLM filtering to improve relevance
Add "Try also" keyword suggestions via vocabulary-based semantic search with optional LLM filtering
Add /api/ai-search/topics and /api/ai-search/keywords Cloudflare Workers endpoints
Both endpoints support source=semantic|llm and llm_filter=true|false for tuning
Topics list fetched dynamically from Datasette and cached in worker memory (no static JSON)
Vectorize indexing scripts for topics and vocabulary data
Update empty state to nudge users toward suggested topics when no results found
Configure Workers AI and Vectorize bindings in wrangler

Test plan

Deploy to staging and verify topic suggestions appear when typing a search query
Verify keyword suggestions appear below topic suggestions
Test /api/ai-search/topics?q=climate returns relevant topics
Test /api/ai-search/keywords?q=how rich are countries returns keyword suggestions
Verify LLM filtering removes irrelevant topics (e.g. "Farm Size" for "how tall are people")
Verify clicking a topic pill navigates to the topic page
Verify clicking a keyword suggestion updates the search query
Test empty state shows "Try exploring a suggested topic above" message

🤖 Generated with Claude Code

Add "Browse topic" and "Try also" keyword suggestions to the search page. Topics are fetched via semantic search (Vectorize) from /api/ai-search/topics. Keywords are generated via vocabulary-based LLM rewriting from /api/ai-search/rewrite. - SearchSuggestedTopics: shows related topic pills below the search bar - SearchSuggestedKeywords: shows keyword suggestions for query refinement - Topics and rewrite API endpoints (Cloudflare Workers) - Vectorize indexing scripts for topics and vocabulary - Baker script for generating topics.json from database - AI and Vectorize bindings in wrangler config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

owidbot · 2026-03-04T09:08:59Z

Quick links (staging server):

Site Dev	Site Preview	Admin	Wizard	Docs

Login: ssh owid@staging-site-ai-search-suggested-topics-k

Site-screenshots: https://github.com/owid/site-screenshots/compare/ai-search-suggested-topics-keywords
SVG tester: https://github.com/owid/owid-grapher-svgs/compare/ai-search-suggested-topics-keywords

Archive:

SVG tester:

Number of differences (graphers): 0 ✅
Number of differences (grapher views): skipped
Number of differences (mdims): skipped
Number of differences (explorers): skipped

Edited: 2026-03-04 09:15:22 UTC
Execution time: 1.31 seconds

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 376bf5ec72

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-04T09:16:03Z

functions/api/ai-search/topics/index.ts

+        // Parse query parameter
+        const query = url.searchParams.get(COMMON_SEARCH_PARAMS.QUERY) || ""
+


Reject empty queries before topic embedding search

This endpoint accepts missing/empty q and still proceeds to generate embeddings and query Vectorize, so requests like /api/ai-search/topics trigger paid AI work for effectively non-specific input. Because this route is public, that can create avoidable cost/load and noisy results; it should return a 400 for empty queries (as the rewrite endpoint already does).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-04T09:16:03Z

functions/api/ai-search/topics/index.ts

+        const limit = Math.min(
+            Math.max(
+                1,
+                parseInt(
+                    url.searchParams.get("limit") || DEFAULT_LIMIT.toString()
+                )


Validate numeric limit before passing it to Vectorize

If limit is non-numeric (for example ?limit=abc), parseInt(...) yields NaN and the current Math.max/Math.min clamp preserves NaN; this then flows into topK for VECTORIZE_TOPICS.query(...), which can fail at runtime and return a 500 instead of a client error. Guarding with Number.isFinite (or defaulting to DEFAULT_LIMIT) would prevent request-triggered failures.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-04T09:16:03Z

functions/api/ai-search/keywords/index.ts

+        const limit = Math.min(
+            Math.max(
+                1,
+                parseInt(
+                    url.searchParams.get("limit") || DEFAULT_LIMIT.toString()
+                )


Sanitize rewrite limit to avoid NaN-driven keyword output

The same parseInt + clamp pattern allows limit=abc to become NaN, so this handler can send malformed constraints (e.g. 1-NaN) into LLM prompts and then slice with NaN, producing empty or unstable keyword responses while still returning 200. Rejecting invalid numeric limits (or falling back to the default) keeps behavior deterministic.

Useful? React with 👍 / 👎.

- Rename rewrite/ endpoint to keywords/ with clearer param names (source=semantic|llm, llm_filter=true|false) - Align topics and keywords endpoints to use same source param values - Add LLM filtering to topics endpoint (default on) to improve relevance - Replace static topics.json with cached Datasette fetch on cold start - Fetch vocabulary data from URL instead of local JSON file - Lowercase keyword suggestions in frontend Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Topics are now fetched from Datasette at runtime (cached in worker), so the R2 upload script and its utility are no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

No longer needed since topics.json was replaced with runtime Datasette fetch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

functions/scripts/indexTopics.ts

+            JSON.stringify({
+                error: "Failed to index topics",
+                details: error instanceof Error ? error.message : String(error),
+            }),


In general, to fix this issue we should avoid including raw exception data (messages, stack traces, stringified errors) in responses sent to clients. Instead, log the detailed error information on the server and return a generic, user-safe error message and possibly a non-sensitive error code or correlation ID.

For this specific code, the best minimal fix without changing functionality is:

Keep console.error("Error indexing topics:", error) so developers still see full error details in logs.

Change the HTTP response body in the catch block to no longer include error.message or String(error).

Replace the details field with a generic message (or remove it) that does not depend on the caught error, ensuring no stack-trace–related information flows to the client.

Concretely, in functions/scripts/indexTopics.ts, lines 226–229 should be updated so that details is either removed or set to a static string. No new imports or helpers are required.

functions/scripts/indexVocabulary.ts

+            JSON.stringify({
+                error: "Failed to index vocabulary",
+                details: error instanceof Error ? error.message : String(error),
+            }),


To fix the problem, the response body for error cases should avoid including information derived directly from the caught error. Instead, send a generic, user-safe error message, and keep the detailed information only in server-side logs (which the code is already writing via console.error). The core behavior of the endpoint (indexing vocabulary and returning success information) should remain unchanged; only the shape of the error response needs adjustment.

Concretely, in functions/scripts/indexVocabulary.ts, within handleRequest’s catch block (lines 247–259), remove the details field that exposes error.message / String(error) to the client. Optionally, replace it with a constant generic hint such as "Try again later or contact support." that does not depend on the caught error. No new imports or helper methods are required; console.error already ensures the detailed error is available in logs.

When search returns no results, nudge users toward the AI-suggested topic pills shown above the results area. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove topic search endpoints, suggested topics/keywords UI components, indexing scripts, and vocabulary data that were moved to PR #6180. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

improve Resolve focus placeholders when looking up reference queries The resolved/unresolved query-string handling was incorrect when a focus parameter was present: reference lookups used the unresolved query string instead of the resolved one. This caused mismatches when comparing graphs that used focus placeholders. To fix this, added a helper to resolve focus placeholders (resolveQueryStrForLookup) that creates a minimal grapher state to compute the resolved query string, and updated verification to use the resolved query string. Also corrected diff line counting to separately compute added/deleted lines and use the maximum as the changed-lines count. Use unresolved query string for SVG records When resolving focus placeholders required loading grapher data, generating a minimal grapher was complex and unnecessary for lookup purposes. Instead, keep and use the original/unresolved query string for SVG record keys and export lookups. This simplifies rendering and lookup logic by removing resolveQueryStrForLookup and related resolution steps, and ensures chart keys are created from the provided query string directly. Include query string when opening SVG in report When opening an SVG from the HTML report, the query string on the record was not being included in the generated links or displayed title, causing opened charts to miss context provided by URL parameters. This change reads queryStr from the svgRecord and appends it to viewId for full links, uses the combined value for the copy-to-clipboard button, and shows the query string in the section title so the correct URL and label are preserved when viewing or copying. Include resolvedQueryStr in svg records and UI Add resolvedQueryStr to the SvgRecord type and propagate it through utils, parsing, and view generation so the compare view and HTML report can use the resolved query string (with actual entity names) where available. This change makes the displayed and copyable URLs use resolvedQueryStr if present, falling back to the original queryStr, improving clarity when placeholders (e.g. <firstSeries>) have been replaced.

Fixing: Set is not a function or its return value is not iterable

…-topics-keywords

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot assigned Marigold Mar 4, 2026

🤖 style: format code

3b52685

chatgpt-codex-connector bot reviewed Mar 4, 2026

View reviewed changes

Marigold and others added 4 commits March 4, 2026 11:16

🤖 style: format code

ddbd137

🗑️🤖 Remove unused R2 indexing code and AI_SEARCH_R2_BUCKET setting

1fd3126

Topics are now fetched from Datasette at runtime (cached in worker), so the R2 upload script and its utility are no longer needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

🤖 Remove JSON include from functions tsconfig

46b78b1

No longer needed since topics.json was replaced with runtime Datasette fetch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-advanced-security bot found potential problems Mar 4, 2026

View reviewed changes

Marigold and others added 2 commits March 4, 2026 11:26

🤖 Update empty state to suggest exploring topics

4a22313

When search returns no results, nudge users toward the AI-suggested topic pills shown above the results area. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

🤖 style: format code

f3431f4

sophiamersmann and others added 10 commits March 4, 2026 15:29

🔨 split hideLegend into hideLegend/hideSeriesLabels

d7d496d

🔨 drop hideLegend

65a18c2

🔨 (svg tester) add focus mode

6ffd052

🔨 (svg tester) assert analytics table have been populated

ad7fed5

fix: run oxfmt API with format options

f73f934

🐛 use uniq instead of a set

5fb5c86

Fixing: Set is not a function or its return value is not iterable

🐝 oxfmt vscode settings

0316000

Merge remote-tracking branch 'origin/master' into ai-search-suggested…

0a167d7

…-topics-keywords

🐛 fix CI: correct source type comparison and remove unused interface

1925e35

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI Search: add suggested topics and keyword suggestions#6180

AI Search: add suggested topics and keyword suggestions#6180
Marigold wants to merge 18 commits intomasterfrom
ai-search-suggested-topics-keywords

Marigold commented Mar 4, 2026 •

edited

Loading

Uh oh!

owidbot commented Mar 4, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 4, 2026

Uh oh!

chatgpt-codex-connector bot Mar 4, 2026

Uh oh!

chatgpt-codex-connector bot Mar 4, 2026

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		// Parse query parameter
		const query = url.searchParams.get(COMMON_SEARCH_PARAMS.QUERY) \|\| ""

@@ -225,7 +225,7 @@
                     return new Response(
                         JSON.stringify({
                             error: "Failed to index topics",
-                            details: error instanceof Error ? error.message : String(error),
+                            details: "An internal error occurred while processing the request.",
                         }),
                         {
                             status: 500,

Uh oh!

Conversation

Marigold commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

owidbot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Check warning

Uh oh!

Copilot Autofix

Check warning

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Marigold commented Mar 4, 2026 •

edited

Loading

owidbot commented Mar 4, 2026 •

edited

Loading