Skip to content

Commit 59afd8d

Browse files
authored
Merge pull request #15 from EternisAI/feat/vc-data-skills
feat(skills): add Semantic Scholar and HF Daily Papers VC data skills
2 parents d6d172d + 13cbfd6 commit 59afd8d

5 files changed

Lines changed: 536 additions & 13 deletions

File tree

skills/cftc/SKILL.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ def _cftc_get(dataset_key: str, params: dict) -> list:
2222
base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/cftc-proxy")
2323
token = os.environ["OPENROUTER_API_KEY"]
2424
query = urllib.parse.urlencode(params)
25-
url = f"{base.rstrip('/')}/{DATASETS[dataset_key]}.json?{query}"
26-
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
27-
with urllib.request.urlopen(req, timeout=20) as resp:
25+
url = f"{base.rstrip('/')}/resource/{DATASETS[dataset_key]}.json?{query}"
26+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "curl/7.88.1"})
27+
with urllib.request.urlopen(req, timeout=60) as resp:
2828
return json.loads(resp.read().decode())
2929
```
3030

skills/courtlistener/SKILL.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ def cl_get(path: str, params: dict | None = None) -> dict:
2323
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
2424
if params:
2525
url += "?" + urllib.parse.urlencode(params)
26-
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
27-
with urllib.request.urlopen(req, timeout=20) as r:
26+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "curl/7.88.1"})
27+
with urllib.request.urlopen(req, timeout=60) as r:
2828
return json.loads(r.read().decode())
2929
```
3030

@@ -72,8 +72,8 @@ def cl_get(path, params=None):
7272
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
7373
if params:
7474
url += "?" + urllib.parse.urlencode(params)
75-
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
76-
with urllib.request.urlopen(req, timeout=20) as r:
75+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "curl/7.88.1"})
76+
with urllib.request.urlopen(req, timeout=60) as r:
7777
return json.loads(r.read().decode())
7878

7979
company = "Tesla Inc"
@@ -95,8 +95,8 @@ def cl_get(path, params=None):
9595
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
9696
if params:
9797
url += "?" + urllib.parse.urlencode(params)
98-
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
99-
with urllib.request.urlopen(req, timeout=20) as r:
98+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "curl/7.88.1"})
99+
with urllib.request.urlopen(req, timeout=60) as r:
100100
return json.loads(r.read().decode())
101101

102102
results = cl_get("/search/", {"type": "rd", "q": "Tesla securities fraud", "page_size": 5})
@@ -119,8 +119,8 @@ def cl_get(path, params=None):
119119
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
120120
if params:
121121
url += "?" + urllib.parse.urlencode(params)
122-
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
123-
with urllib.request.urlopen(req, timeout=20) as r:
122+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "curl/7.88.1"})
123+
with urllib.request.urlopen(req, timeout=60) as r:
124124
return json.loads(r.read().decode())
125125

126126
docket_id = 67687667 # from search results

skills/hf-daily-papers/SKILL.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
---
2+
name: hf-daily-papers
3+
description: Fetch and search AI research papers from Hugging Face — today's curated daily feed, full paper metadata, and semantic search across the HF corpus. Use when users ask about trending AI papers, specific arXiv papers, recent ML research, or want to search by topic.
4+
allowed-tools: Bash(python3 -c *), Bash(python3 - *), Bash(python3 *)
5+
---
6+
7+
# Hugging Face Papers API
8+
9+
## Authentication
10+
11+
```python
12+
import urllib.parse
13+
import urllib.request
14+
import json
15+
import os
16+
17+
base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy")
18+
token = os.environ["OPENROUTER_API_KEY"]
19+
20+
def hf_get(path: str, params: dict | None = None) -> dict | list:
21+
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
22+
if params:
23+
url += "?" + urllib.parse.urlencode(params)
24+
req = urllib.request.Request(url, headers={
25+
"Authorization": f"Bearer {token}",
26+
"User-Agent": "axion-hf-client/1.0",
27+
})
28+
with urllib.request.urlopen(req, timeout=30) as r:
29+
return json.loads(r.read().decode())
30+
```
31+
32+
## Supported Tools
33+
34+
| Tool | Endpoint | What it returns |
35+
| --- | --- | --- |
36+
| `get_daily_papers` | `GET /api/daily_papers` | Community-curated papers with upvotes for a given date |
37+
| `get_paper_details` | `GET /api/papers/{paper_id}` | Full metadata for any HF-indexed paper by arXiv ID |
38+
| `search_hf_papers` | `GET /api/papers/search` | Semantic search across all HF-indexed papers |
39+
40+
## Call Graph
41+
42+
```
43+
1. get_daily_papers date / sort → paper list (github/project data included when present)
44+
search_hf_papers query → paper list across full HF corpus
45+
46+
2. get_paper_details arxiv_id → single paper deep-dive; use for direct arXiv ID lookups
47+
```
48+
49+
## Triage Field Set
50+
51+
```
52+
paper.id → bare arXiv ID (e.g. 2307.09288)
53+
paper.title → paper title
54+
paper.upvotes → community upvote count
55+
paper.publishedAt → arXiv publication date
56+
paper.ai_summary → AI-generated summary (prefer over raw abstract)
57+
paper.ai_keywords → topic tags e.g. ["reasoning", "agents", "vision"]
58+
paper.githubRepo → linked GitHub repo URL (nullable)
59+
paper.githubStars → GitHub stars at submission time (nullable)
60+
paper.projectPage → demo/project URL — productization signal (nullable)
61+
submittedBy.name → curator name — credibility signal (daily feed only)
62+
```
63+
64+
## Examples
65+
66+
### Today's Trending Papers
67+
68+
```python
69+
import urllib.parse, urllib.request, json, os
70+
71+
base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy")
72+
token = os.environ["OPENROUTER_API_KEY"]
73+
74+
def hf_get(path, params=None):
75+
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
76+
if params:
77+
url += "?" + urllib.parse.urlencode(params)
78+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"})
79+
with urllib.request.urlopen(req, timeout=30) as r:
80+
return json.loads(r.read().decode())
81+
82+
papers = hf_get("/api/daily_papers", {"limit": 20, "sort": "trending"})
83+
for item in papers:
84+
p = item["paper"]
85+
repo = f" | repo={p['githubRepo']}" if p.get("githubRepo") else ""
86+
stars = f"{p['githubStars']}" if p.get("githubStars") else ""
87+
print(f"[{p['upvotes']}↑] {p['title']}{repo}{stars}")
88+
if p.get("ai_summary"):
89+
print(f" {p['ai_summary'][:120]}...")
90+
```
91+
92+
### Paper Details by arXiv ID
93+
94+
```python
95+
import urllib.parse, urllib.request, json, os
96+
97+
base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy")
98+
token = os.environ["OPENROUTER_API_KEY"]
99+
100+
def hf_get(path, params=None):
101+
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
102+
if params:
103+
url += "?" + urllib.parse.urlencode(params)
104+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"})
105+
with urllib.request.urlopen(req, timeout=30) as r:
106+
return json.loads(r.read().decode())
107+
108+
arxiv_id = "2307.09288" # bare arXiv ID — no prefix, no URL
109+
p = hf_get(f"/api/papers/{arxiv_id}")
110+
111+
print(f"{p['title']} ({p['publishedAt'][:10]})")
112+
print(f"Upvotes: {p['upvotes']}")
113+
print(f"Keywords: {', '.join(p.get('ai_keywords', []))}")
114+
print(f"Summary: {p.get('ai_summary', 'N/A')}")
115+
print(f"GitHub: {p.get('githubRepo', 'none')} ({p.get('githubStars', 0)} stars)")
116+
print(f"Project: {p.get('projectPage', 'none')}")
117+
authors = [a['name'] for a in p.get('authors', [])]
118+
print(f"Authors: {', '.join(authors[:5])}")
119+
```
120+
121+
### Search by Topic
122+
123+
```python
124+
import urllib.parse, urllib.request, json, os
125+
126+
base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy")
127+
token = os.environ["OPENROUTER_API_KEY"]
128+
129+
def hf_get(path, params=None):
130+
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
131+
if params:
132+
url += "?" + urllib.parse.urlencode(params)
133+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"})
134+
with urllib.request.urlopen(req, timeout=30) as r:
135+
return json.loads(r.read().decode())
136+
137+
results = hf_get("/api/papers/search", {"q": "agentic reasoning tool use", "limit": 10})
138+
for item in results:
139+
p = item["paper"]
140+
demo = " [has demo]" if p.get("projectPage") else ""
141+
code = " [has code]" if p.get("githubRepo") else ""
142+
print(f"[{p['upvotes']}↑] {p['title']}{demo}{code}")
143+
print(f" {p.get('ai_summary', '')[:100]}...")
144+
```
145+
146+
### Papers for a Specific Date
147+
148+
```python
149+
import urllib.parse, urllib.request, json, os
150+
151+
base = os.environ["OPENROUTER_BASE_URL"].replace("/api/llm-proxy", "/api/hf-proxy")
152+
token = os.environ["OPENROUTER_API_KEY"]
153+
154+
def hf_get(path, params=None):
155+
url = f"{base.rstrip('/')}/{path.lstrip('/')}"
156+
if params:
157+
url += "?" + urllib.parse.urlencode(params)
158+
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}", "User-Agent": "axion-hf-client/1.0"})
159+
with urllib.request.urlopen(req, timeout=30) as r:
160+
return json.loads(r.read().decode())
161+
162+
papers = hf_get("/api/daily_papers", {"date": "2026-04-21", "limit": 50, "sort": "trending"})
163+
if not papers:
164+
print("No papers — likely a weekend date")
165+
else:
166+
for item in papers:
167+
p = item["paper"]
168+
print(f"[{p['upvotes']}↑] {p['id']}{p['title']}")
169+
```
170+
171+
## Gotchas
172+
173+
- **Weekdays only** — weekend dates return an empty list, not an error. Check the day before retrying.
174+
- **`sort=trending` preferred** — ranks by upvote velocity, not raw count. Better signal than `publishedAt`.
175+
- **`upvotes` is nested** — access as `item["paper"]["upvotes"]`, not `item["upvotes"]`.
176+
- **Bare arXiv ID only** for `get_paper_details` — no `arxiv.org`, `abs/`, or `https://` prefix.
177+
- **GitHub/project fields are conditional** — absent means genuinely not set; no endpoint will surface them if the paper has no linked repo.
178+
- **`ai_summary` over `summary`** — AI summary is more useful than the raw abstract for downstream consumption.
179+
- **`githubStars` is a snapshot** — cached at HF submission time, may be stale.
180+
- **`projectPage` is a strong signal** — only papers with active demos fill this field. Rare = productization signal.
181+
- **HF corpus only for search** — not all arXiv papers are indexed. Coverage is strong for ML/AI. For full academic coverage use Semantic Scholar.
182+
- **Call `get_daily_papers` once per conversation** — returns up to 100 papers per call; no need to repeat.
183+
- **HTTP 429 has no `Retry-After`** — use exponential backoff: `min(2^attempt, 60)` seconds.
184+
185+
## Usage Rules
186+
187+
- Use `get_daily_papers` for recency/trending; use `search_hf_papers` when the user has a topic.
188+
- Always null-check `githubRepo`, `githubStars`, and `projectPage` before using.
189+
- Use `ai_summary` and `ai_keywords` for triage — faster and cleaner than raw abstracts.
190+
- For direct arXiv ID lookups or deep-dives, use `get_paper_details` — it covers the full HF corpus, not just the daily feed.
191+
- Rate limit: 1,000 req / 5 min (authenticated free tier). Sufficient for all agentic use cases.

0 commit comments

Comments
 (0)