[bug] In hybrid mode, the context is excessively long and fails to be truncated correctly.

Hybrid 模式下的上下文截断流程
1. 高级别查询（Global Query）的截断流程
在 _build_global_query_context 函数中（operate.py）： 截断点 1：关系截断
位置：operate.py:2458-2462
使用 max_token_for_global_context（默认4000 tokens）
对关系的 description 字段进行截断
截断点 2：实体截断
位置：_find_most_related_entities_from_relationships 函数（operate.py）
使用 max_token_for_local_context（默认4000 tokens）
对实体的 description 字段进行截断
截断点 3：文本块截断
位置：_find_related_text_unit_from_relationships 函数（operate.py）
使用 max_token_for_text_unit（默认4000 tokens）
对文本块的 content 字段进行截断
2. 低级别查询（Local Query）的截断流程
在 _build_local_query_context 函数中（operate.py）： 截断点 1：实体截断
位置：_find_most_related_text_unit_from_entities -> _find_most_related_entities_from_relationships（operate.py:2555-2559）
使用 max_token_for_local_context（默认4000 tokens）
对实体的 description 字段进行截断
截断点 2：关系截断
位置：_find_most_related_edges_from_entities 函数（operate.py）
使用 max_token_for_global_context（默认4000 tokens）
对关系的 description 字段进行截断
截断点 3：文本块截断
位置：_find_most_related_text_unit_from_entities 函数（operate.py:1965）
使用 max_token_for_text_unit（默认4000 tokens）
对文本块的 content 字段进行截断
3. Combine Contexts 的二次截断
在 combine_contexts 函数中（operate.py:2815-2835）： 截断点 1：合并实体
combined_entities = chunking_by_token_size(combined_entities, max_token_size=2000)
截断点 2：合并关系
combined_relationships = chunking_by_token_size(combined_relationships, max_token_size=2000)
截断点 3：合并来源
combined_sources = chunking_by_token_size(combined_sources, max_token_size=2000)
4. 问题分析
为什么会出现 76623 tokens 的情况：
截断函数使用不当：
chunking_by_token_size 是用于文本分块的函数，不是截断函数
它会将内容分成多个块，而不是截断到指定长度
这可能导致内容被重复或增加，而不是减少
问题根源：
在 combine_contexts 中错误地使用了 chunking_by_token_size 进行截断
应该使用 truncate_list_by_token_size 或类似的截断函数
chunking_by_token_size 返回的是一个列表，不是字符串
修复建议：
替换 chunking_by_token_size 为正确的截断函数
或者实现一个专门用于字符串截断的函数
确保 token 计算的准确性

<img width="1444" height="251" alt="Image" src="https://github.com/user-attachments/assets/3d9fac44-8268-406a-9217-dddb6346f4ef" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] In hybrid mode, the context is excessively long and fails to be truncated correctly. #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] In hybrid mode, the context is excessively long and fails to be truncated correctly. #108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions