Add arXiv Omi integration app by wly12312 · Pull Request #7438 · BasedHardware/omi

wly12312 · 2026-05-21T13:07:51Z

Summary

add a standalone no-auth arXiv integration app under plugins/omi-arxiv-app
expose Omi chat tools for paper search, paper metadata lookup, and recent author papers
include Railway/Heroku deployment files and local usage docs

Validation

python -m py_compile plugins/omi-arxiv-app/main.py
git diff --check
FastAPI TestClient manifest checks for all 3 tools and type: object schemas
helper checks for string/invalid limits, arXiv paper ID cleanup, category validation, and manifest schemas
live arXiv smoke check for search_papers

Candidate integration app for #3120.

greptile-apps · 2026-05-21T13:12:53Z

Greptile Summary

This PR adds a self-contained arXiv integration plugin under plugins/omi-arxiv-app that exposes three Omi chat tools — paper search, paper detail lookup, and author search — backed by the public arXiv Atom API with no auth or environment variables required.

main.py: FastAPI app with a shared httpx.AsyncClient managed by a lifespan context, input sanitization helpers (_safe_paper_id, _safe_category, _safe_limit), Atom feed XML parsing, and a /.well-known/omi-tools.json manifest endpoint.
Deployment files (Procfile, railway.toml, runtime.txt): ready for Railway or Heroku, pinning Python 3.11.9 and starting uvicorn on $PORT.
requirements.txt: fully pinned dependency set (fastapi, uvicorn, pydantic, httpx) with no conflicts.

Confidence Score: 4/5

The plugin is a new standalone directory with no changes to the main backend; the only risk is in the arXiv integration logic itself, which is well-isolated.

The code is clean and well-structured. The two issues found — double XML traversal in _format_entry and silent rejection of old-style versioned arXiv IDs like hep-th/9905001v1 in _safe_paper_id — are both minor and non-blocking for the common case. No auth, secrets, or database access is involved.

plugins/omi-arxiv-app/main.py — specifically the _safe_paper_id version-stripping logic and _format_entry's double call to _entry_authors.

Important Files Changed

Filename	Overview
plugins/omi-arxiv-app/main.py	Core FastAPI app with three Omi chat tools backed by the arXiv Atom API; minor issues with double XML traversal in _format_entry and old-style versioned IDs silently rejected by _safe_paper_id
plugins/omi-arxiv-app/requirements.txt	Pins fastapi, uvicorn, pydantic, and httpx to specific versions; compatible set with no obvious conflicts
plugins/omi-arxiv-app/Procfile	Standard Heroku/Railway Procfile; starts uvicorn on $PORT with a sensible default of 8080
plugins/omi-arxiv-app/railway.toml	Railway deployment config with NIXPACKS builder, /health check, and ON_FAILURE restart policy
plugins/omi-arxiv-app/runtime.txt	Pins Python 3.11.9 for Heroku/Railway runtime
plugins/omi-arxiv-app/README.md	Clear local dev and deployment docs with working curl examples for all three tools
plugins/omi-arxiv-app/.gitignore	Standard Python gitignore for .venv, pycache, and .pyc files

Sequence Diagram

sequenceDiagram
    participant Omi as Omi Client
    participant App as omi-arxiv-app (FastAPI)
    participant arXiv as arXiv Atom API

    Omi->>App: GET /.well-known/omi-tools.json
    App-->>Omi: tool manifest (search_papers, get_paper_details, search_author)

    Omi->>App: POST /tools/search_papers
    App->>App: _build_search_query() sanitize inputs
    App->>arXiv: "GET /api/query?search_query=...&max_results=N"
    arXiv-->>App: Atom XML feed
    App->>App: _parse_entries() + _format_entry()
    App-->>Omi: "ChatToolResponse {result}"

    Omi->>App: POST /tools/get_paper_details
    App->>App: "_safe_paper_id() validate & clean"
    App->>arXiv: "GET /api/query?id_list=XXXX&max_results=1"
    arXiv-->>App: Atom XML feed
    App-->>Omi: "ChatToolResponse {result}"

    Omi->>App: POST /tools/search_author
    App->>arXiv: "GET /api/query?search_query=au:Name&sortBy=submittedDate"
    arXiv-->>App: Atom XML feed
    App-->>Omi: "ChatToolResponse {result}"

_{Reviews (1): Last reviewed commit: "Add arXiv Omi integration app" | Re-trigger Greptile}

greptile-apps · 2026-05-21T13:12:57Z

+    authors = ", ".join(_entry_authors(entry)[:5]) or "unknown authors"
+    if len(_entry_authors(entry)) > 5:
+        authors += ", et al."


_entry_authors is called twice on lines 158–160, traversing the XML tree twice. Storing the result in a variable avoids the redundant walk.

Suggested change

authors = ", ".join(_entry_authors(entry)[:5]) or "unknown authors"

if len(_entry_authors(entry)) > 5:

authors += ", et al."

all_authors = _entry_authors(entry)

authors = ", ".join(all_authors[:5]) or "unknown authors"

if len(all_authors) > 5:

authors += ", et al."

Addressed in 9f1991b: _entry_authors(entry) is now stored once in all_authors and reused for formatting.

greptile-apps · 2026-05-21T13:12:58Z

+    candidate = candidate.removeprefix("https://arxiv.org/abs/")
+    candidate = candidate.removeprefix("http://arxiv.org/abs/")
+    candidate = candidate.removeprefix("arXiv:")
+    candidate = candidate.split("v", 1)[0] if re.match(r"^\d{4}\.\d{4,5}v\d+$", candidate) else candidate


Old-style versioned arXiv IDs such as hep-th/9905001v1 or cs/9901001v2 pass through the version-strip branch unchanged (it only fires for new-format YYMM.NNNNvN IDs), so fullmatch on \d{7} fails and _safe_paper_id returns None. Users who paste a versioned legacy ID will get a confusing "invalid paper ID" error. Extend the split to cover old-format IDs as well.

Suggested change

candidate = candidate.split("v", 1)[0] if re.match(r"^\d{4}\.\d{4,5}v\d+$", candidate) else candidate

candidate = candidate.split("v", 1)[0] if re.match(r"^\d{4}\.\d{4,5}v\d+$", candidate) or re.match(r"^[a-z\-]+(\.[A-Z]{2})?/\d{7}v\d+$", candidate) else candidate

Addressed in 9f1991b: _safe_paper_id now strips versions for both new-format and legacy arXiv IDs such as hep-th/9905001v1.

greptile-apps · 2026-05-21T13:12:59Z

+async def _request_arxiv(params: dict[str, Any]) -> str:
+    client = await _get_arxiv_client()
+    response = await client.get(ARXIV_API_URL, params=params)
+    response.raise_for_status()
+    return response.text


No rate-limit protection for arXiv API calls

arXiv's usage policy asks automated clients to stay under 3 requests per second. With no rate-limiting or retry-with-backoff logic here, concurrent users could collectively trigger 429/503 responses that surface as opaque httpx.HTTPError messages. Consider adding a simple delay between requests or at minimum returning a user-friendly message when arXiv returns a 429.

Addressed in 9f1991b: arXiv requests are serialized with a small delay to stay below 3 req/s, and 429/503 responses now return user-friendly messages.

Add arXiv Omi integration app

ced601b

wly12312 mentioned this pull request May 21, 2026

create new app bounties #3120

Open

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Address arXiv app review feedback

9f1991b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add arXiv Omi integration app#7438

Add arXiv Omi integration app#7438
wly12312 wants to merge 2 commits into
BasedHardware:mainfrom
wly12312:omi-arxiv-app

wly12312 commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

wly12312 May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

wly12312 May 21, 2026

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

wly12312 May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	candidate = candidate.split("v", 1)[0] if re.match(r"^\d{4}\.\d{4,5}v\d+$", candidate) else candidate
	candidate = candidate.split("v", 1)[0] if re.match(r"^\d{4}\.\d{4,5}v\d+$", candidate) or re.match(r"^[a-z\-]+(\.[A-Z]{2})?/\d{7}v\d+$", candidate) else candidate

Conversation

wly12312 commented May 21, 2026

Summary

Validation

Uh oh!

greptile-apps Bot commented May 21, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

wly12312 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

wly12312 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

wly12312 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant