Skip to content

Commit 6357a5c

Browse files
[chore] daily pipeline
1 parent abcf647 commit 6357a5c

197 files changed

Lines changed: 8982 additions & 23 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

archive/20260604/recommend/arxiv_papers_20260604.standard.json

Lines changed: 457 additions & 0 deletions
Large diffs are not rendered by default.

archive/carryover.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
{
2-
"generated_at": "2026-06-03T21:43:47.636709+00:00",
3-
"updated_date": "20260603",
2+
"generated_at": "2026-06-04T21:47:30.140124+00:00",
3+
"updated_date": "20260604",
44
"carryover_days": 9,
55
"tag_states": {
66
"continual": {
7-
"updated_date": "20260603",
7+
"updated_date": "20260604",
88
"carryover_days": 9,
99
"items": []
1010
}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: "Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents"
3+
title_zh: 学习检索:面向文本到SQL智能体的双层长期记忆
4+
authors: "Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao, Yuxiong He"
5+
date: 2026-05-30
6+
pdf: "https://arxiv.org/pdf/2606.00547v1"
7+
tags: ["query:continual"]
8+
score: 7.0
9+
evidence: 提出长期记忆检索机制,使代理跨任务复用过往经验
10+
tldr: 交互式Text-to-SQL代理依赖长期记忆重用经验,但现有检索方法无法适应多阶段记忆需求。本文提出MERIT框架,采用双层级(回合级与步骤级)记忆并通过强化学习优化检索策略,引入过程奖励模型提供密集信号。实验显示MERIT在BIRD-Interact上提升成功率并减少交互轮数,且具备跨基准迁移能力,证明了多水平检索对经验重用的价值。
11+
source: arxiv
12+
selection_source: fresh_fetch
13+
figures_json: "[{\"url\": \"assets/figures/arxiv/2606.00547v1/fig-001.webp\", \"caption\": \"\", \"page\": 0, \"index\": 1, \"width\": 1653, \"height\": 487, \"label\": \"Figure\"}, {\"url\": \"assets/figures/arxiv/2606.00547v1/fig-002.webp\", \"caption\": \"\", \"page\": 0, \"index\": 2, \"width\": 1663, \"height\": 959, \"label\": \"Figure\"}, {\"url\": \"assets/figures/arxiv/2606.00547v1/fig-003.webp\", \"caption\": \"\", \"page\": 0, \"index\": 3, \"width\": 1659, \"height\": 350, \"label\": \"Figure\"}]"
14+
tables_json: "[{\"url\": \"assets/tables/arxiv/2606.00547v1/table-001.webp\", \"caption\": \"\", \"page\": 0, \"index\": 1, \"width\": 788, \"height\": 700, \"label\": \"Table\"}, {\"url\": \"assets/tables/arxiv/2606.00547v1/table-002.webp\", \"caption\": \"\", \"page\": 0, \"index\": 2, \"width\": 1637, \"height\": 647, \"label\": \"Table\"}, {\"url\": \"assets/tables/arxiv/2606.00547v1/table-003.webp\", \"caption\": \"\", \"page\": 0, \"index\": 3, \"width\": 1653, \"height\": 347, \"label\": \"Table\"}, {\"url\": \"assets/tables/arxiv/2606.00547v1/table-004.webp\", \"caption\": \"\", \"page\": 0, \"index\": 4, \"width\": 795, \"height\": 200, \"label\": \"Table\"}, {\"url\": \"assets/tables/arxiv/2606.00547v1/table-005.webp\", \"caption\": \"\", \"page\": 0, \"index\": 5, \"width\": 642, \"height\": 254, \"label\": \"Table\"}, {\"url\": \"assets/tables/arxiv/2606.00547v1/table-006.webp\", \"caption\": \"\", \"page\": 0, \"index\": 6, \"width\": 1650, \"height\": 237, \"label\": \"Table\"}]"
15+
motivation: 现有记忆检索方法无法适应交互式Text-to-SQL中不同决策阶段对记忆的不同需求,静态方法固定、动态方法稀疏。
16+
method: 提出MERIT框架,维护回合级和步骤级双层级记忆,均使用强化学习训练检索策略,并借助过程奖励模型为步骤级检索提供密集奖励。
17+
result: 在BIRD-Interact上,MERIT成功率优于无记忆、静态和动态检索基线,且减少平均交互轮数;在Spider2-Snow上实现正向跨基准迁移。
18+
conclusion: 多水平动态记忆检索能有效提升交互式Text-to-SQL代理的经验重用,无需针对性调优即可泛化至新基准。
19+
---
20+
21+
## 摘要
22+
交互式文本到SQL智能体通过多轮交互完成数据库任务,包括模式探索、查询执行、反馈解读和决策修正。长期记忆有助于智能体复用过往经验,但现有检索方法仍存在局限。静态方法依赖固定的相似性启发式,未优化下游效用;而动态方法通常从稀疏的最终结果中学习,并在单一决策层面检索记忆。当记忆在不同交互阶段的有用性发生变化时,这种做法并不充分,因为对初始规划有用的记忆可能不同于局部、依赖状态的执行所需的记忆。我们提出MERIT,一个动态多时域记忆检索框架。MERIT维护情节级记忆以提供全局策略指导,并维护回合级记忆以支持局部决策。两级记忆均采用强化学习优化的检索策略。为了在有限的中间监督信号下训练回合级检索,MERIT使用一个轻量级过程奖励模型,为局部记忆选择提供密集的代理奖励。在BIRD-Interact上的实验表明,MERIT在成功率上优于无记忆、静态检索和动态检索基线,同时减少了平均交互轮次。在Spider2-Snow上的迁移结果进一步显示了无需基准特定调优的跨基准正向迁移。这些结果表明,多时域检索改善了交互式文本到SQL智能体的经验复用。
23+
24+
## Abstract
25+
Interactive text-to-SQL agents solve database tasks through multi-turn interactions involving schema exploration, query execution, feedback interpretation, and decision revision. Long-term memory helps agents reuse past experiences, but existing retrieval methods remain limited. Static methods rely on fixed similarity heuristics that do not optimize downstream utility, while dynamic methods often learn from sparse final outcomes and retrieve memories at a single decision horizon. This is insufficient when memory usefulness changes across interaction stages, since memories useful for initial planning may differ from those needed for local, state-conditioned execution. We propose MERIT, a dynamic multi-horizon memory retrieval framework. MERIT maintains episode-level memory for global strategic guidance and turn-level memory for local decision support. Both levels use learned retrieval policies optimized with reinforcement learning. To train turn-level retrieval despite limited intermediate supervision, MERIT uses a lightweight Process Reward Model to provide dense proxy rewards for local memory selection. Experiments on BIRD-Interact show that MERIT outperforms no-memory, static-retrieval, and dynamic-retrieval baselines in success rate while reducing average interaction turns. Transfer results on Spider2-Snow further show positive cross-benchmark transfer without benchmark-specific tuning. These results suggest that multi-horizon retrieval improves experience reuse in interactive text-to-SQL agents.

0 commit comments

Comments
 (0)