Skip to content

Commit 5687dcd

Browse files
committed
feat: 八维评分体系重构
- innovation 0-2, impact 1.5, engineering_score 1.5 - 总分 = min(10, 各子项之和), 满分11 - 支持多种LLM评分理由格式解析(/10转换、得分X.Y、dim(max分)等) - 开源矛盾检测(高分但无链接自动修正) - 机器摘要/评分理由格式强制约束 - validate-scores.js 验证脚本 - 全量文档/测试同步更新
1 parent 01cfd1e commit 5687dcd

22 files changed

Lines changed: 776 additions & 211 deletions

AGENTS.md

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## 项目概述
44

5-
自动化"语音/音乐/音频论文速递"流水线:arXiv + HuggingFace 抓取 → LLM 筛选 → 多模态深度分析 → 发布到 Hugo 博客 / 微信公众号 / 小红书。
5+
自动化"语音/音乐/音频论文速递"流水线:arXiv + HuggingFace 抓取 → LLM 筛选 → 多模态深度分析 → 发布到 Hugo 博客 / 微信公众号 / 小红书 / 飞书
66

77
**技术栈**:Node.js(核心流水线)+ Python(发布脚本)。要求 Node ≥ 18。
88

@@ -12,11 +12,23 @@
1212
npm install # 安装依赖(仅 cheerio + pdf-parse)
1313
npm test # 运行单元测试(node --test tests/*.test.js)
1414
npm run fetch # 全流程:抓取 + 筛选 + 深度分析
15-
npm run deep # 仅深度分析(跳过已分析论文
15+
npm run deep # 仅深度分析续跑(跳过已有 analysis
1616
npm run reanalyze # 强制全量重分析
17+
npm run batch # 批量分析未分析论文
18+
npm run backfill # 补录历史 paper ID(不分析)
1719
npm run publish # 发布到 Hugo 博客(python3 scripts/publish-to-blog.py)
18-
npm run wechat # 发布微信公众号草稿
20+
npm run wechat # 生成微信公众号草稿
1921
npm run xiaohongshu # 生成小红书文案
22+
npm run xhs-login # 小红书登录(获取 Cookie)
23+
npm run xhs-publish # 小红书自动发布单篇
24+
npm run xhs-publish-all # 小红书自动发布全部
25+
26+
# 直接调用(不在 package.json 中)
27+
node scripts/quick-test.js # 快速测试(抓+筛选,不分析)
28+
node scripts/analyze-single-paper.js <arxiv-id> # 单独分析一篇论文
29+
node scripts/batch-analyze.js # 批量分析未分析论文
30+
python3 scripts/publish-to-feishu.py # 生成飞书文档
31+
python3 scripts/publish-to-feishu.py --date 2026-04-21
2032
```
2133

2234
未配置 linter、typecheck 或 formatter。`npm test` 是唯一的自动化检查。
@@ -26,9 +38,16 @@ npm run xiaohongshu # 生成小红书文案
2638
复制 `env.example``.env`(已 gitignore)。必需变量:
2739

2840
- `PAPER_ANALYZER_API_KEY` / `PAPER_ANALYZER_MODEL` / `PAPER_ANALYZER_ENDPOINT` — LLM 筛选 + 分析
29-
- `WECHAT_APP_ID` / `WECHAT_APP_SECRET` — 微信发布(可选)
41+
- `WECHAT_APP_ID` / `WECHAT_APP_SECRET` / `WECHAT_THUMB_MEDIA_ID` — 微信发布(可选)
3042
- `PAPER_DIGEST_BLOG_REPO` — Hugo 博客仓库路径(发布用 + 抓取时去重用)
3143

44+
其他常用可选变量:
45+
46+
- `FEISHU_APP_ID` / `FEISHU_APP_SECRET` — 飞书发布
47+
- `PAPER_DIGEST_AUTHOR` — 作者名(用于发布)
48+
- `PAPER_DIGEST_IMAGE_HOST` / `PAPER_DIGEST_IMAGE_BASE_URL` — 图床配置(可选 `local`/`qiniu`
49+
- `XIAOHONGSHU_COOKIES` — 小红书 Cookie(JSON 格式 base64 编码)
50+
3251
Python 脚本通过 `python-dotenv` 加载 `.env`。Node 脚本通过 `utils.js` 中的 `loadEnvFile()` 加载。
3352

3453
环境变量覆盖项:`PD_ANALYSIS_CONCURRENCY``PD_ANALYSIS_MAX_RETRIES``PD_FILTER_BATCH_SIZE``PD_ARXIV_MAX_RESULTS`
@@ -38,16 +57,25 @@ Python 脚本通过 `python-dotenv` 加载 `.env`。Node 脚本通过 `utils.js`
3857
### 入口脚本
3958

4059
- `scripts/full-fetch.js` — 主编排器(去重含博客已发布 → 抓取 → 筛选 → 分析 → 保存)
41-
- `scripts/deep-analyzer.js` — LLM 深度分析,3 轮流水线(分析 → 开源扫描 → 补缺重写)
4260
- `scripts/fetch-papers.js` — arXiv 抓取(网页抓取为主,API为辅)+ LLM 筛选
4361
- `scripts/fetch-huggingface-papers.js` — HuggingFace Papers 抓取
44-
- `scripts/analysis-engine.js` — 批量分析协调器
62+
- `scripts/deep-analysis-only.js` — 仅深度分析续跑(跳过已分析论文)
63+
- `scripts/reanalyze.js` — 强制全量重分析(支持 `--concurrency N`
64+
- `scripts/batch-analyze.js` — 批量分析未分析论文
65+
- `scripts/analysis-engine.js` — 批量分析协调器(被上述脚本共用)
66+
- `scripts/quick-test.js` — 快速测试(抓+筛选,不分析)
67+
- `scripts/analyze-single-paper.js` — 单独分析一篇指定 arXiv ID 的论文
4568

4669
### 发布脚本(Python)
4770

4871
- `scripts/publish-to-blog.py` — 生成 Hugo Markdown 文章并推送到博客仓库
72+
- `scripts/publish-wechat-full.py` — 生成微信公众号草稿
73+
- `scripts/publish-xiaohongshu.py` — 生成小红书文案
74+
- `scripts/xiaohongshu-publisher.py` — 小红书自动发布(需先 `xhs-login`
75+
- `scripts/publish-to-feishu.py` — 生成飞书文档
76+
- `scripts/backfill_papers.py` — 补录论文 ID 到 papers.json(不分析)
4977
- `scripts/publish_common.py` — 发布通用工具
50-
- `scripts/utils.py` — Python 端工具函数(去 Markdown 标记、解析分析结果)
78+
- `scripts/utils.py` — Python 端工具函数
5179

5280
### 配置
5381

@@ -84,3 +112,4 @@ prompts/ # LLM prompt 模板
84112
- `data/``logs/` 已 gitignore——不要提交运行时产物。
85113
- 博客发布脚本依赖独立的 Hugo 仓库(`PAPER_DIGEST_BLOG_REPO`),不在本仓库内。
86114
- 测试使用 Node.js 内置测试运行器(`node:test`),非 Jest 或 Mocha。
115+
- CI 会运行 `npm test` 以及对关键脚本做 `node -c` 语法检查(`scripts/utils.js``config.js``analysis-engine.js`、测试文件)。

SKILL.en.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,9 @@ Output constraints:
124124
- Prompt source: `prompts/deep-analysis.md`, read at runtime via `loadPrompt()` and replaces `{hasFullText}`, `{title}`, `{authors}`, `{categories}`, `{arxivId}`, `{textForAnalysis}` placeholders
125125
- Fixed level-1 headings: `## Score`, `## Machine Summary`, `## Tags`, `## Authors & Institutions`, `## Roast`, `## Core Summary`, `## Method Overview & Architecture`, `## Core Innovations`, `## Experimental Results`, `## Detailed Description`, `## Scoring Rationale`, `## Limitations & Issues`, `## Open Source Details`
126126
- Under `## Score`, output the total score first (X.X/10)
127-
- **Code post-processing**: `parseAnalysis`/`parse_analysis` extracts seven sub-items (Innovation/3, Technical Rigor/1.5, Experimental Sufficiency/1.5, Clarity/1, Impact/2, Open Source/1.5, Reproducibility/0.5) from `## Scoring Rationale` to recalculate the total score, rounding to 0.1, overriding the LLM's raw total score
128-
- `## Machine Summary` includes `rank_bucket` (with top-conference mapping), `quality_score` (comprehensive academic quality 0-7), `value_score` (impact 0-2), `reproducibility_bonus` (comprehensive reproducibility 0-2), `confidence`, `primary_task_tag`, `primary_method_tag`, and other fixed keys
129-
- Scoring uses a seven-dimensional reviewer system: Innovation (0-3) + Technical Rigor (0-1.5) + Experimental Sufficiency (0-1.5) + Clarity (0-1) + Impact (0-2) + Open Source (0-1.5) + Reproducibility (0-0.5)
127+
- **Code post-processing**: `parseAnalysis`/`parse_analysis` extracts eight sub-items (Innovation/2, Technical Rigor/1.5, Experimental Sufficiency/1.5, Clarity/1, Impact/1.5, Open Source/1.5, Reproducibility/0.5, Engineering/Practical Value/1.5) from `## Scoring Rationale` to recalculate the total score, capped at 10, rounding to 0.1, overriding the LLM's raw total score
128+
- `## Machine Summary` includes `rank_bucket` (with top-conference mapping), `innovation` (innovation 0-2), `technical_rigor` (technical rigor 0-1.5), `experimental_sufficiency` (experimental sufficiency 0-1.5), `clarity` (clarity 0-1), `impact` (impact 0-1.5), `open_source` (open source 0-1.5), `reproducibility` (reproducibility 0-0.5), `engineering_score` (engineering/practical value 0-1.5), `confidence`, `primary_task_tag`, `primary_method_tag`, and other fixed keys
129+
- Scoring uses an eight-dimensional reviewer system: Innovation (0-2) + Technical Rigor (0-1.5) + Experimental Sufficiency (0-1.5) + Clarity (0-1) + Impact (0-1.5) + Open Source (0-1.5) + Reproducibility (0-0.5) + Engineering/Practical Value (0-1.5), max 11, total capped at 10
130130
- **Code post-processing**: `parseAnalysis`/`parse_analysis` always extracts sub-items from `## Scoring Rationale` to recalculate the total score, overriding the LLM's raw output to prevent LLM calculation errors
131131
- Tag output must simultaneously include the final tag string, `Primary Task Tag`, `Primary Method Tag`, and `Supplementary Tags`
132132
- Missing information must be written as "Not stated / Not provided / Not mentioned"; guessing author institutions, experimental numbers, open source status, or external information is prohibited

SKILL.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,9 @@ API 调用特性:
124124
- prompt 来源:`prompts/deep-analysis.md`,运行时通过 `loadPrompt()` 读取并替换 `{hasFullText}``{title}``{authors}``{categories}``{arxivId}``{textForAnalysis}` 占位符
125125
- 固定一级标题:`## 评分``## 机器摘要``## 标签``## 作者与机构``## 毒舌点评``## 核心摘要``## 方法概述和架构``## 核心创新点``## 实验结果``## 细节详述``## 评分理由``## 局限与问题``## 开源详情`
126126
- `## 评分` 下先输出总分(X.X/10)
127-
- **代码后处理**`parseAnalysis`/`parse_analysis` 会从 `## 评分理由` 中提取七个分项(创新性/3、技术严谨性/1.5、实验充分性/1.5、清晰度/1、影响力/2、开源/1.5、可复现性/0.5)重新计算总分,四舍五入到 0.1,覆盖 LLM 原始总分
128-
- `## 机器摘要` 包含 `rank_bucket`(带顶会映射)、`quality_score`(综合学术质量 0-7)、`value_score`影响力 0-2)、`reproducibility_bonus`(可复现性综合 0-2)、`confidence``primary_task_tag``primary_method_tag` 等固定键
129-
- 评分采用七维审稿人体系:创新性(0-3)+ 技术严谨性(0-1.5)+ 实验充分性(0-1.5)+ 清晰度(0-1)+ 影响力(0-2)+ 开源(0-1.5)+ 可复现性(0-0.5)
127+
- **代码后处理**`parseAnalysis`/`parse_analysis` 会从 `## 评分理由` 中提取八个分项(创新性/2、技术严谨性/1.5、实验充分性/1.5、清晰度/1、影响力/1.5、开源/1.5、可复现性/0.5、工程/实践价值/1.5)重新计算总分,各分项之和上限为 10,四舍五入到 0.1,覆盖 LLM 原始总分
128+
- `## 机器摘要` 包含 `rank_bucket`(带顶会映射)、`innovation`(创新性 0-2)、`technical_rigor`(技术严谨性 0-1.5)、`experimental_sufficiency`(实验充分性 0-1.5)、`clarity`(清晰度 0-1)、`impact`影响力 0-1.5)、`open_source`(开源 0-1.5)、`reproducibility`(可复现性 0-0.5)、`engineering_score`(工程/实践价值 0-1.5)、`confidence``primary_task_tag``primary_method_tag` 等固定键
129+
- 评分采用八维审稿人体系:创新性(0-2)+ 技术严谨性(0-1.5)+ 实验充分性(0-1.5)+ 清晰度(0-1)+ 影响力(0-1.5)+ 开源(0-1.5)+ 可复现性(0-0.5)+ 工程/实践价值(0-1.5),满分 11 分,总分上限 10
130130
- **代码后处理**`parseAnalysis`/`parse_analysis` 始终从 `## 评分理由` 提取分项重新计算总分,覆盖 LLM 原始输出,避免 LLM 算错总分
131131
- 标签输出必须同时包含最终标签串、`主任务标签``主方法标签``补充标签`
132132
- 缺失信息必须写"未说明/未提供/未提及",禁止猜测作者机构、实验数字、开源状态或外部信息

docs/data-format.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,14 @@
112112
"opensource": "...",
113113
"machineSummary": {
114114
"rankBucket": "前25%",
115-
"qualityScore": "5.2",
116-
"valueScore": "1.5",
117-
"reproducibilityBonus": "0.8",
115+
"innovation": "1.5",
116+
"technicalRigor": "1.2",
117+
"experimentalSufficiency": "1.0",
118+
"clarity": "0.8",
119+
"impact": "1.3",
120+
"openSource": "1.0",
121+
"reproducibility": "0.3",
122+
"engineeringScore": "1.2",
118123
"confidence": "",
119124
"primaryTaskTag": "#语音合成",
120125
"primaryMethodTag": "#扩散模型",
@@ -124,9 +129,14 @@
124129
"hasDataset": ""
125130
},
126131
"rankBucket": "前25%",
127-
"qualityScore": "5.2",
128-
"valueScore": "1.5",
129-
"reproducibilityBonus": "0.8",
132+
"innovationScore": "1.5",
133+
"technicalRigorScore": "1.2",
134+
"experimentalSufficiencyScore": "1.0",
135+
"clarityScore": "0.8",
136+
"impactScore": "1.3",
137+
"openSourceScore": "1.0",
138+
"reproducibilityScore": "0.3",
139+
"engineeringScore": "1.2",
130140
"confidence": "",
131141
"primaryTaskTag": "#语音合成",
132142
"primaryMethodTag": "#扩散模型",
@@ -146,7 +156,7 @@
146156

147157
- `parsed``analysis` 文本的解析缓存,由 `scripts/utils.js``parseAnalysis()``scripts/utils.py``parse_analysis()` 生成
148158
- **`parsed.score` 不是直接取 `## 评分` 下的 LLM 原始总分**,而是从 `## 评分理由` 中提取七个分项(创新性/3、技术严谨性/1.5、实验充分性/1.5、清晰度/1、影响力/2、开源/1.5、可复现性/0.5)重新计算,四舍五入到 0.1,覆盖 LLM 原始输出
149-
- `parsed` 中的 `machineSummary``## 机器摘要` 的解析结果;`rankBucket``qualityScore``valueScore` 等字段同时平铺到 `parsed` 顶层以便访问
159+
- `parsed` 中的 `machineSummary``## 机器摘要` 的解析结果;`rankBucket``innovationScore``technicalRigorScore` 等 8 个子项字段同时平铺到 `parsed` 顶层以便访问
150160
- 解析逻辑变更后,`parsed` 缓存会被清除并在下次发布时重新生成
151161

152162
### 5.4 `data/current/analyzed.json`

docs/en/data-format.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,14 @@ Core analysis results. Structure:
112112
"opensource": "...",
113113
"machineSummary": {
114114
"rankBucket": "前25%",
115-
"qualityScore": "5.2",
116-
"valueScore": "1.5",
117-
"reproducibilityBonus": "0.8",
115+
"innovation": "1.5",
116+
"technicalRigor": "1.2",
117+
"experimentalSufficiency": "1.0",
118+
"clarity": "0.8",
119+
"impact": "1.3",
120+
"openSource": "1.0",
121+
"reproducibility": "0.3",
122+
"engineeringScore": "1.2",
118123
"confidence": "",
119124
"primaryTaskTag": "#语音合成",
120125
"primaryMethodTag": "#扩散模型",
@@ -124,9 +129,14 @@ Core analysis results. Structure:
124129
"hasDataset": ""
125130
},
126131
"rankBucket": "前25%",
127-
"qualityScore": "5.2",
128-
"valueScore": "1.5",
129-
"reproducibilityBonus": "0.8",
132+
"innovationScore": "1.5",
133+
"technicalRigorScore": "1.2",
134+
"experimentalSufficiencyScore": "1.0",
135+
"clarityScore": "0.8",
136+
"impactScore": "1.3",
137+
"openSourceScore": "1.0",
138+
"reproducibilityScore": "0.3",
139+
"engineeringScore": "1.2",
130140
"confidence": "",
131141
"primaryTaskTag": "#语音合成",
132142
"primaryMethodTag": "#扩散模型",
@@ -145,8 +155,8 @@ Core analysis results. Structure:
145155
**Notes on the `parsed` field**:
146156

147157
- `parsed` is a parsed cache of the `analysis` text, generated by `parseAnalysis()` in `scripts/utils.js` or `parse_analysis()` in `scripts/utils.py`
148-
- **`parsed.score` is not the raw total score from the LLM under `## 评分`**. Instead, it is recalculated by extracting seven sub-items from `## 评分理由` (Innovation/3, Technical Rigor/1.5, Experimental Sufficiency/1.5, Clarity/1, Impact/2, Open Source/1.5, Reproducibility/0.5), rounding to 0.1, and overriding the LLM's raw output
149-
- `machineSummary` inside `parsed` is the parsed result of `## 机器摘要`; fields such as `rankBucket`, `qualityScore`, `valueScore`, etc. are also flattened to the top level of `parsed` for easier access
158+
- **`parsed.score` is not the raw total score from the LLM under `## 评分`**. Instead, it is recalculated by extracting eight sub-items from `## 评分理由` (Innovation/2, Technical Rigor/1.5, Experimental Sufficiency/1.5, Clarity/1, Impact/1.5, Open Source/1.5, Reproducibility/0.5, Engineering/Practical Value/1.5), capping at 10, rounding to 0.1, and overriding the LLM's raw output
159+
- `machineSummary` inside `parsed` is the parsed result of `## 机器摘要`; fields such as `rankBucket`, `innovationScore`, `technicalRigorScore`, etc. are also flattened to the top level of `parsed` for easier access
150160
- When parsing logic changes, the `parsed` cache is cleared and regenerated on the next publish
151161

152162
### 5.4 `data/current/analyzed.json`

0 commit comments

Comments
 (0)