Skip to content

Commit c2330b9

Browse files
authored
Merge pull request #195 from reoLantern/main
Add configurable arXiv cross-list support
2 parents dc39485 + e647bd3 commit c2330b9

File tree

3 files changed

+11
-1
lines changed

3 files changed

+11
-1
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,11 +95,13 @@ llm:
9595
source:
9696
arxiv:
9797
category: ["cs.AI","cs.CV","cs.LG","cs.CL"]
98+
include_cross_list: false # Set to true to include arXiv cross-list papers in these categories.
9899

99100
executor:
100101
debug: ${oc.env:DEBUG,null}
101102
source: ['arxiv']
102103
```
104+
Set `source.arxiv.include_cross_list: true` if you want cross-listed papers included.
103105
>[!NOTE]
104106
> `${oc.env:XXX,yyy}` means the value of the environment variable `XXX`. If the variable is not set, the default value `yyy` will be used.
105107

@@ -113,6 +115,7 @@ zotero:
113115
source:
114116
arxiv:
115117
category: null # The categories of target arxiv papers. Find the abbr of your research area from [here](https://arxiv.org/category_taxonomy). Example: ["cs.AI","cs.CV","cs.LG","cs.CL"]
118+
include_cross_list: false # Whether to include arXiv cross-list papers in subscribed categories. Example: true
116119
biorxiv:
117120
category: null # The categories of target biorxiv papers. Find categories from [here](https://www.biorxiv.org/). Example: ["biochemistry","animal behavior and cognition"]
118121
medrxiv:

config/base.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ zotero:
66
source:
77
arxiv:
88
category: null # The categories of target arxiv papers. Find the abbr of your research area from [here](https://arxiv.org/category_taxonomy). Example: ["cs.AI","cs.CV","cs.LG","cs.CL"]
9+
include_cross_list: false # Whether to include arXiv cross-list papers in subscribed categories. Example: true
910
biorxiv:
1011
category: null # The categories of target biorxiv papers. Find categories from [here](https://www.biorxiv.org/). Example: ["biochemistry","animal behavior and cognition"]
1112
medrxiv:

src/zotero_arxiv_daily/retriever/arxiv_retriever.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,18 @@ def __init__(self, config):
2121
def _retrieve_raw_papers(self) -> list[ArxivResult]:
2222
client = arxiv.Client(num_retries=10,delay_seconds=10)
2323
query = '+'.join(self.config.source.arxiv.category)
24+
include_cross_list = self.config.source.arxiv.get("include_cross_list", False)
2425
# Get the latest paper from arxiv rss feed
2526
feed = feedparser.parse(f"https://rss.arxiv.org/atom/{query}")
2627
if 'Feed error for query' in feed.feed.title:
2728
raise Exception(f"Invalid ARXIV_QUERY: {query}.")
2829
raw_papers = []
29-
all_paper_ids = [i.id.removeprefix("oai:arXiv.org:") for i in feed.entries if i.get("arxiv_announce_type","new") == 'new']
30+
allowed_announce_types = {"new", "cross"} if include_cross_list else {"new"}
31+
all_paper_ids = [
32+
i.id.removeprefix("oai:arXiv.org:")
33+
for i in feed.entries
34+
if i.get("arxiv_announce_type", "new") in allowed_announce_types
35+
]
3036
if self.config.executor.debug:
3137
all_paper_ids = all_paper_ids[:10]
3238

0 commit comments

Comments
 (0)