|
| 1 | +--- |
| 2 | +name: hf-daily-papers |
| 3 | +description: Fetch and search AI research papers from Hugging Face — today's curated daily feed, full paper metadata, and semantic search across the HF corpus. Use when users ask about trending AI papers, specific arXiv papers, recent ML research, or want to search by topic. |
| 4 | +allowed-tools: Bash(python3 -c *), Bash(python3 - *), Bash(python3 *) |
| 5 | +--- |
| 6 | + |
| 7 | +# Hugging Face Papers API |
| 8 | + |
| 9 | +## Authentication |
| 10 | + |
| 11 | +```python |
| 12 | +import urllib.parse |
| 13 | +import urllib.request |
| 14 | +import json |
| 15 | +import os |
| 16 | + |
| 17 | +base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy") |
| 18 | +token = os.environ["OPENROUTER_API_KEY"] |
| 19 | + |
| 20 | +def hf_get(path: str, params: dict | None = None) -> dict | list: |
| 21 | + url = f"{base.rstrip('/')}/{path.lstrip('/')}" |
| 22 | + if params: |
| 23 | + url += "?" + urllib.parse.urlencode(params) |
| 24 | + req = urllib.request.Request(url, headers={ |
| 25 | + "Authorization": f"Bearer {token}", |
| 26 | + "User-Agent": "axion-hf-client/1.0", |
| 27 | + }) |
| 28 | + with urllib.request.urlopen(req, timeout=30) as r: |
| 29 | + return json.loads(r.read().decode()) |
| 30 | +``` |
| 31 | + |
| 32 | +## Supported Tools |
| 33 | + |
| 34 | +| Tool | Endpoint | What it returns | |
| 35 | +| --- | --- | --- | |
| 36 | +| `get_daily_papers` | `GET /api/daily_papers` | Community-curated papers with upvotes for a given date | |
| 37 | +| `get_paper_details` | `GET /api/papers/{paper_id}` | Full metadata for any HF-indexed paper by arXiv ID | |
| 38 | +| `search_hf_papers` | `GET /api/papers/search` | Semantic search across all HF-indexed papers | |
| 39 | + |
| 40 | +## Call Graph |
| 41 | + |
| 42 | +``` |
| 43 | +1. get_daily_papers date / sort → paper list (github/project data included when present) |
| 44 | + search_hf_papers query → paper list across full HF corpus |
| 45 | +
|
| 46 | +2. get_paper_details arxiv_id → single paper deep-dive; use for direct arXiv ID lookups |
| 47 | +``` |
| 48 | + |
| 49 | +## Triage Field Set |
| 50 | + |
| 51 | +``` |
| 52 | +paper.id → bare arXiv ID (e.g. 2307.09288) |
| 53 | +paper.title → paper title |
| 54 | +paper.upvotes → community upvote count |
| 55 | +paper.publishedAt → arXiv publication date |
| 56 | +paper.ai_summary → AI-generated summary (prefer over raw abstract) |
| 57 | +paper.ai_keywords → topic tags e.g. ["reasoning", "agents", "vision"] |
| 58 | +paper.githubRepo → linked GitHub repo URL (nullable) |
| 59 | +paper.githubStars → GitHub stars at submission time (nullable) |
| 60 | +paper.projectPage → demo/project URL — productization signal (nullable) |
| 61 | +submittedBy.name → curator name — credibility signal (daily feed only) |
| 62 | +``` |
| 63 | + |
| 64 | +## Examples |
| 65 | + |
| 66 | +### Today's Trending Papers |
| 67 | + |
| 68 | +```python |
| 69 | +import urllib.parse, urllib.request, json, os |
| 70 | + |
| 71 | +base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy") |
| 72 | +token = os.environ["OPENROUTER_API_KEY"] |
| 73 | + |
| 74 | +def hf_get(path, params=None): |
| 75 | + url = f"{base.rstrip('/')}/{path.lstrip('/')}" |
| 76 | + if params: |
| 77 | + url += "?" + urllib.parse.urlencode(params) |
| 78 | + req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"}) |
| 79 | + with urllib.request.urlopen(req, timeout=30) as r: |
| 80 | + return json.loads(r.read().decode()) |
| 81 | + |
| 82 | +papers = hf_get("/api/daily_papers", {"limit": 20, "sort": "trending"}) |
| 83 | +for item in papers: |
| 84 | + p = item["paper"] |
| 85 | + repo = f" | repo={p['githubRepo']}" if p.get("githubRepo") else "" |
| 86 | + stars = f" ⭐{p['githubStars']}" if p.get("githubStars") else "" |
| 87 | + print(f"[{p['upvotes']}↑] {p['title']}{repo}{stars}") |
| 88 | + if p.get("ai_summary"): |
| 89 | + print(f" {p['ai_summary'][:120]}...") |
| 90 | +``` |
| 91 | + |
| 92 | +### Paper Details by arXiv ID |
| 93 | + |
| 94 | +```python |
| 95 | +import urllib.parse, urllib.request, json, os |
| 96 | + |
| 97 | +base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy") |
| 98 | +token = os.environ["OPENROUTER_API_KEY"] |
| 99 | + |
| 100 | +def hf_get(path, params=None): |
| 101 | + url = f"{base.rstrip('/')}/{path.lstrip('/')}" |
| 102 | + if params: |
| 103 | + url += "?" + urllib.parse.urlencode(params) |
| 104 | + req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"}) |
| 105 | + with urllib.request.urlopen(req, timeout=30) as r: |
| 106 | + return json.loads(r.read().decode()) |
| 107 | + |
| 108 | +arxiv_id = "2307.09288" # bare arXiv ID — no prefix, no URL |
| 109 | +p = hf_get(f"/api/papers/{arxiv_id}") |
| 110 | + |
| 111 | +print(f"{p['title']} ({p['publishedAt'][:10]})") |
| 112 | +print(f"Upvotes: {p['upvotes']}") |
| 113 | +print(f"Keywords: {', '.join(p.get('ai_keywords', []))}") |
| 114 | +print(f"Summary: {p.get('ai_summary', 'N/A')}") |
| 115 | +print(f"GitHub: {p.get('githubRepo', 'none')} ({p.get('githubStars', 0)} stars)") |
| 116 | +print(f"Project: {p.get('projectPage', 'none')}") |
| 117 | +authors = [a['name'] for a in p.get('authors', [])] |
| 118 | +print(f"Authors: {', '.join(authors[:5])}") |
| 119 | +``` |
| 120 | + |
| 121 | +### Search by Topic |
| 122 | + |
| 123 | +```python |
| 124 | +import urllib.parse, urllib.request, json, os |
| 125 | + |
| 126 | +base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy") |
| 127 | +token = os.environ["OPENROUTER_API_KEY"] |
| 128 | + |
| 129 | +def hf_get(path, params=None): |
| 130 | + url = f"{base.rstrip('/')}/{path.lstrip('/')}" |
| 131 | + if params: |
| 132 | + url += "?" + urllib.parse.urlencode(params) |
| 133 | + req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"}) |
| 134 | + with urllib.request.urlopen(req, timeout=30) as r: |
| 135 | + return json.loads(r.read().decode()) |
| 136 | + |
| 137 | +results = hf_get("/api/papers/search", {"q": "agentic reasoning tool use", "limit": 10}) |
| 138 | +for item in results: |
| 139 | + p = item["paper"] |
| 140 | + demo = " [has demo]" if p.get("projectPage") else "" |
| 141 | + code = " [has code]" if p.get("githubRepo") else "" |
| 142 | + print(f"[{p['upvotes']}↑] {p['title']}{demo}{code}") |
| 143 | + print(f" {p.get('ai_summary', '')[:100]}...") |
| 144 | +``` |
| 145 | + |
| 146 | +### Papers for a Specific Date |
| 147 | + |
| 148 | +```python |
| 149 | +import urllib.parse, urllib.request, json, os |
| 150 | + |
| 151 | +base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy") |
| 152 | +token = os.environ["OPENROUTER_API_KEY"] |
| 153 | + |
| 154 | +def hf_get(path, params=None): |
| 155 | + url = f"{base.rstrip('/')}/{path.lstrip('/')}" |
| 156 | + if params: |
| 157 | + url += "?" + urllib.parse.urlencode(params) |
| 158 | + req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"}) |
| 159 | + with urllib.request.urlopen(req, timeout=30) as r: |
| 160 | + return json.loads(r.read().decode()) |
| 161 | + |
| 162 | +papers = hf_get("/api/daily_papers", {"date": "2026-04-21", "limit": 50, "sort": "trending"}) |
| 163 | +if not papers: |
| 164 | + print("No papers — likely a weekend date") |
| 165 | +else: |
| 166 | + for item in papers: |
| 167 | + p = item["paper"] |
| 168 | + print(f"[{p['upvotes']}↑] {p['id']} — {p['title']}") |
| 169 | +``` |
| 170 | + |
| 171 | +## Gotchas |
| 172 | + |
| 173 | +- **Weekdays only** — weekend dates return an empty list, not an error. Check the day before retrying. |
| 174 | +- **`sort=trending` preferred** — ranks by upvote velocity, not raw count. Better signal than `publishedAt`. |
| 175 | +- **`upvotes` is nested** — access as `item["paper"]["upvotes"]`, not `item["upvotes"]`. |
| 176 | +- **Bare arXiv ID only** for `get_paper_details` — no `arxiv.org`, `abs/`, or `https://` prefix. |
| 177 | +- **GitHub/project fields are conditional** — absent means genuinely not set; no endpoint will surface them if the paper has no linked repo. |
| 178 | +- **`ai_summary` over `summary`** — AI summary is more useful than the raw abstract for downstream consumption. |
| 179 | +- **`githubStars` is a snapshot** — cached at HF submission time, may be stale. |
| 180 | +- **`projectPage` is a strong signal** — only papers with active demos fill this field. Rare = productization signal. |
| 181 | +- **HF corpus only for search** — not all arXiv papers are indexed. Coverage is strong for ML/AI. For full academic coverage use Semantic Scholar. |
| 182 | +- **Call `get_daily_papers` once per conversation** — returns up to 100 papers per call; no need to repeat. |
| 183 | +- **HTTP 429 has no `Retry-After`** — use exponential backoff: `min(2^attempt, 60)` seconds. |
| 184 | + |
| 185 | +## Usage Rules |
| 186 | + |
| 187 | +- Use `get_daily_papers` for recency/trending; use `search_hf_papers` when the user has a topic. |
| 188 | +- Always null-check `githubRepo`, `githubStars`, and `projectPage` before using. |
| 189 | +- Use `ai_summary` and `ai_keywords` for triage — faster and cleaner than raw abstracts. |
| 190 | +- For direct arXiv ID lookups or deep-dives, use `get_paper_details` — it covers the full HF corpus, not just the daily feed. |
| 191 | +- Rate limit: 1,000 req / 5 min (authenticated free tier). Sufficient for all agentic use cases. |
0 commit comments