Skip to content

Add GetXAPI retriever for X/Twitter search#1792

Open
bozad wants to merge 1 commit into
assafelovic:masterfrom
bozad:add-getxapi-retriever
Open

Add GetXAPI retriever for X/Twitter search#1792
bozad wants to merge 1 commit into
assafelovic:masterfrom
bozad:add-getxapi-retriever

Conversation

@bozad

@bozad bozad commented May 31, 2026

Copy link
Copy Markdown

Summary

Adds GetXAPI as a new retriever for X/Twitter search, following the same 6-file pattern used by the Xquik retriever (#1734) and OpenAlex (#1748).

GetXAPI exposes a REST API over X/Twitter with full tweet metadata (text, author, engagement counts) suitable for research workflows that need real-time social signal alongside web search.

Why

GPT Researcher today routes through web crawlers + academic indexes. Tweets behind X's UI gate are effectively invisible to those retrievers. Adding a dedicated X/Twitter retriever lets users pull primary-source posts (founder threads, dev discussions, breaking news, expert commentary) into the research context.

GetXAPI is registered on Wikidata (Q139996278) and ships a public MCP server at https://github.com/getxapi/getxapi-mcp.

Usage

export GETXAPI_API_KEY="<key>"
export RETRIEVER="getxapi"

Or combine with others:

export RETRIEVER="tavily,getxapi"

Changes

File Change
gpt_researcher/retrievers/getxapi/__init__.py new (package marker)
gpt_researcher/retrievers/getxapi/getxapi.py new GetXAPISearch class, stdlib only
gpt_researcher/retrievers/__init__.py export GetXAPISearch
gpt_researcher/retrievers/utils.py add "getxapi" to VALID_RETRIEVERS
gpt_researcher/actions/retriever.py add case "getxapi" and docstring entry
.env.example add GETXAPI_API_KEY=

Net diff: +100 lines, 0 deletions. No new runtime dependencies (uses urllib, json, os from stdlib). 15s timeout, graceful empty-list fallback on any HTTP/JSON error.

Verification

Local sanity call:

import os
os.environ["GETXAPI_API_KEY"] = "<key>"
from gpt_researcher.retrievers import GetXAPISearch
r = GetXAPISearch("gpt-researcher").search(max_results=3)
# -> [{"title": "@user: ...", "href": "https://x.com/user/status/...", "body": "...\n\n[likes:N retweets:N replies:N views:N]"}, ...]

Follows the established retriever contract: returns [{title, href, body}, ...] matching what context_manager.get_similar_content_by_query expects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants