feat: Auto-adjust chunk recall weights based on user feedback by MkDev11 · Pull Request #12689 · infiniflow/ragflow

MkDev11 · 2026-01-19T03:11:35Z

What problem does this PR solve?

Implements automatic adjustment of knowledge base chunk recall weights based on user feedback (upvotes/downvotes). When users upvote or downvote a response, the system locates the corresponding knowledge snippets and adjusts their recall weight to improve future retrieval quality.

Closes #12670

How it works:

User upvotes/downvotes a response via POST /thumbup
System extracts chunk IDs from the conversation reference
For each referenced chunk:
- Reads current pagerank_fea value from document store
- Increments (+1) for upvote or decrements (-1) for downvote
- Clamps weight to [0, 100] range
- Updates chunk in ES/Infinity/OceanBase
Future retrievals score these chunks higher/lower based on accumulated feedback

Files changed:

api/db/services/chunk_feedback_service.py - New service for updating chunk pagerank weights
api/apps/conversation_app.py - Integrated feedback service into thumbup endpoint
test/testcases/test_web_api/test_chunk_feedback/ - Unit tests

Type of change

New Feature (non-breaking change which adds functionality)

When users upvote or downvote responses, the system now automatically adjusts the pagerank_fea field of referenced chunks to improve future retrieval quality. - Add ChunkFeedbackService for weight management - Integrate with thumbup endpoint - Add unit tests - Fix: Handle chunks_format() field transformations (id/dataset_id) Closes infiniflow#12670

KevinHuSh · 2026-01-19T05:02:04Z

Appreciations!
Boosting some certain of chunks' pagerank score by thumbuping some conversation sessions, I don't know, might cause some side effects in one way or another.
Pagerank score is a very sensitive score, once these chunks are up, they're probabaly to be on top anyway. Unless we have a lot of users and the feadback data is dense enough.

- Disable feature by default (CHUNK_FEEDBACK_ENABLED env var) - Reduce weight increments from 1 to 0.1 to prevent outsized impact - Add test for disabled feature flag - Update docstrings to document the safeguards This addresses concerns about pagerank sensitivity and sparse feedback data.

MkDev11 · 2026-01-19T08:05:39Z

@KevinHuSh Thanks for your feedback! You raise valid concerns about pagerank sensitivity.

I've added safeguards to address this:

Feature is disabled by default - requires CHUNK_FEEDBACK_ENABLED=true to activate, so no impact unless explicitly opted in
Reduced weight increments from 1 to 0.1 - a single vote now has 10x less impact, requiring many votes before meaningfully affecting rankings

This way, deployments with sparse feedback won't see unintended side effects, while those with dense user activity can opt in and benefit from the learning signal. Happy to adjust the increment further or add additional safeguards (like requiring N votes before applying, or time-based decay) if you think it's needed.

MkDev11 · 2026-02-24T06:53:28Z

@ZhenhangTung could you please let your members review the PR?

codecov · 2026-02-27T13:54:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.52%. Comparing base (b1d28b5) to head (303222c).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #12689      +/-   ##
==========================================
- Coverage   96.72%   96.52%   -0.21%     
==========================================
  Files          10       10              
  Lines         703      690      -13     
  Branches      112      108       -4     
==========================================
- Hits          680      666      -14     
- Misses          5        8       +3     
+ Partials       18       16       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yingfeng · 2026-02-27T14:35:23Z

Thanks, please fix the CI error:


==================================== ERRORS ====================================
_ ERROR collecting test/testcases/test_web_api/test_chunk_feedback/test_chunk_feedback_service.py _
ImportError while importing test module '/home/infiniflow/runners_work/inf29-ac7fd0784ffe/ragflow/ragflow/test/testcases/test_web_api/test_chunk_feedback/test_chunk_feedback_service.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/alice/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
test/testcases/test_web_api/test_chunk_feedback/test_chunk_feedback_service.py:21: in <module>
    from api.db.services.chunk_feedback_service import (
api/db/services/__init__.py:19: in <module>
    from .user_service import UserService as UserService
api/db/services/user_service.py:24: in <module>
    from api.db.db_models import DB, UserTenant
api/db/db_models.py:53: in <module>
    from api.utils.configs import deserialize_b64, serialize_b64
api/utils/configs.py:21: in <module>
    from common.config_utils import get_base_config
E   ModuleNotFoundError: No module named 'common.config_utils'; 'common' is not a package
=========================== short test summary info ============================
ERROR test/testcases/test_web_api/test_chunk_feedback/test_chunk_feedback_service.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
======================= 176 deselected, 1 error in 1.61s =======================

Avoids CI collection error caused by test/testcases/test_web_api/common.py shadowing the project-level common/ package. Uses the same importlib + stub pattern established by other unit tests in this directory (e.g. test_chunk_routes_unit.py).

MkDev11 · 2026-03-23T23:04:59Z

@KevinHuSh @yingfeng sorry to bother you, can you let me know what I need to fix or update?

yingfeng · 2026-03-24T04:15:26Z

This design feels a bit simplistic; we might want to explore a feedback-driven weighting approach instead.

- Default CHUNK_FEEDBACK_WEIGHTING=relevance: split fixed per-event budget using similarity / vector_similarity / term_similarity from reference chunks - Keep CHUNK_FEEDBACK_WEIGHTING=uniform for legacy per-chunk increments - apply_feedback_to_chunks uses even split when scores are unavailable - Update and extend unit tests

- Apply chunk feedback only when vote direction changes; require bool thumbup - Remove unused extract_chunk_ids / apply_feedback_to_chunks; single-pass row build - Lazy logging; document read-modify-write race on pagerank updates - Tests: feedback rows + thumbup idempotency/boolean matrix

- Require dialog access via UserTenantService + DialogService.query before thumb updates and chunk feedback (align with GET /conversation/get) - Store pagerank changes as integers; split one unit across chunks in relevance mode (largest remainder); skip zero deltas - Delegate get_chunk_kb_mapping to _feedback_rows_from_reference - Update chunk_feedback and conversation route unit tests

rank_feature fields must not be set to 0; align chunk feedback with kb_app by sending remove for elasticsearch/opensearch. Extend single-doc update in es_conn and opensearch_conn to honor remove alongside doc payloads. Tests: stub DOC_ENGINE=infinity by default; add ES/OS remove assertions.

- Enforce conv.user_id vs current user on GET/RM/thumbup when user_id is set - Add adjust_chunk_pagerank_fea for ES/OS (painless) and OB (single UPDATE); ChunkFeedbackService prefers atomic path; Infinity keeps RMW - Remove unused get_chunk_kb_mapping; tests derive mapping from feedback rows - conversation unit tests: mineru_parser stub, quart stub, pathlib upload helper; cover non-owner get/rm/thumbup

- Require owner + tenant/dialog before conversation update (is_new false) - Set req user_id only after checks; align create path - Quart stub: Response callable, headers.add_header, current_app/g/session - Extend set_conversation tests; dedupe chunk_feedback mapping tests

Use asyncio.to_thread with a sync helper for test file writes.

- Update conversation with only name, user_id, and optional message list (no reference/dialog_id passthrough) - Extend quart stub: has_request_context, has_websocket_context, websocket, jsonify - _StubQuartHeaders as dict + add_header; seed Content-Type from mimetype/content_type on Response

- Reset get_by_id and tenant/dialog mocks after conv_foreign rm case so successful delete sees owner conversation. - Support sync generators in _read_sse_text (TTS) alongside async SSE.

MkDev11 · 2026-04-01T07:59:20Z

@yingfeng can you review the changes again?

- Load real quart and only replace Response with the test stub so quart_auth, api_utils, and connection_utils get full quart API. - Stub deepdoc.parser.paddleocr_parser for rag.llm OCR lazy import chain.

Reuse the tenant_id from the dialog ownership loop; chunk feedback only needs that tenant, not a second dialog fetch.

MkDev11 · 2026-04-02T05:17:10Z

@yingfeng @Magicbook1108 can you help me finish the PR?

yingfeng · 2026-04-02T06:14:32Z

Waiting for a PR on the Infinity issue to be resolved. This PR hasn't yet applied the necessary changes to Infinity, which will serve as a built-in retrieval engine going forward.

To update a single row in Infinity, the internal _row_id is required—a field that is currently absent from Infinity's response. Once that PR is submitted, you can proceed with applying the thumb updating logic to Infinity as well.

MkDev11 · 2026-04-02T09:31:03Z

which PR do you mean?

Skip update_chunk_weight on Infinity backend for now, since safe single-row update requires internal row id support not yet available. Update tests to cover the explicit Infinity skip behavior.

yingfeng · 2026-04-02T10:17:56Z

which PR do you mean?

#13901

Please note that _row_id can be obtained from the retrieval results returned by Infinity. You can use chunk_id to retrieve the corresponding _row_id, and then use this _row_id to update the specific row.

However, if multiple users attempt to update the same chunk concurrently, only one operation will succeed. This is because a successful update modifies the _row_id associated with that chunk_id, while other users are still using the now-stale old _row_id. The Python code should handle this scenario gracefully.

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 💞 feature Feature request, pull request that fullfill a new feature. labels Jan 19, 2026

MkDev11 added 2 commits January 18, 2026 22:12

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

aa5a533

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

aeafe2d

MkDev11 added 3 commits January 19, 2026 03:07

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

971f28b

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

d579ed3

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

0c2afbc

MkDev11 closed this Jan 20, 2026

MkDev11 reopened this Jan 20, 2026

MkDev11 closed this Jan 21, 2026

MkDev11 reopened this Jan 21, 2026

MkDev11 added 8 commits January 28, 2026 22:16

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

af60a16

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

e310e8c

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

4ce7b62

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

69fc0d4

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

bb0fb0f

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

a38af4c

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

f9cb024

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

56c3e5a

yingfeng added the ci Continue Integration label Feb 27, 2026

yingfeng marked this pull request as draft February 27, 2026 12:42

yingfeng marked this pull request as ready for review February 27, 2026 12:42

dosubot bot added the 🧪 test Pull requests that update test cases. label Feb 27, 2026

MkDev11 and others added 2 commits February 27, 2026 08:29

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

a2aa43d

MkDev11 and others added 6 commits April 1, 2026 02:28

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

7836c65

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Apr 1, 2026

MkDev11 and others added 5 commits April 1, 2026 06:11

Merge branch 'main' into feature/chunk-feedback-weight-adjustment

27f9d8b

test: fix ASYNC240 — avoid pathlib in async DummyUploadedFile.save

14a310d

Use asyncio.to_thread with a sync helper for test file writes.

test: fix conversation rm guard and TTS stream reader in unit tests

512d005

- Reset get_by_id and tenant/dialog mocks after conv_foreign rm case so successful delete sees owner conversation. - Support sync generators in _read_sse_text (TTS) alongside async SSE.

MkDev11 added 2 commits April 1, 2026 13:06

test: fix conversation route unit imports under quart_auth and rag.llm

2ea3611

- Load real quart and only replace Response with the test stub so quart_auth, api_utils, and connection_utils get full quart API. - Stub deepdoc.parser.paddleocr_parser for rag.llm OCR lazy import chain.

perf: avoid redundant DialogService.get_by_id in thumbup

dbb8b06

Reuse the tenant_id from the dialog ownership loop; chunk feedback only needs that tenant, not a second dialog fetch.

fix: defer Infinity chunk feedback updates until row-id support

303222c

Skip update_chunk_weight on Infinity backend for now, since safe single-row update requires internal row id support not yet available. Update tests to cover the explicit Infinity skip behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Auto-adjust chunk recall weights based on user feedback#12689

feat: Auto-adjust chunk recall weights based on user feedback#12689
MkDev11 wants to merge 31 commits intoinfiniflow:mainfrom
MkDev11:feature/chunk-feedback-weight-adjustment

MkDev11 commented Jan 19, 2026 •

edited by KevinHuSh

Loading

Uh oh!

KevinHuSh commented Jan 19, 2026

Uh oh!

MkDev11 commented Jan 19, 2026

Uh oh!

MkDev11 commented Feb 24, 2026

Uh oh!

codecov bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

yingfeng commented Feb 27, 2026

Uh oh!

MkDev11 commented Mar 23, 2026

Uh oh!

yingfeng commented Mar 24, 2026

Uh oh!

MkDev11 commented Apr 1, 2026

Uh oh!

MkDev11 commented Apr 2, 2026

Uh oh!

yingfeng commented Apr 2, 2026 •

edited

Loading

Uh oh!

MkDev11 commented Apr 2, 2026

Uh oh!

yingfeng commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MkDev11 commented Jan 19, 2026 • edited by KevinHuSh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Uh oh!

KevinHuSh commented Jan 19, 2026

Uh oh!

MkDev11 commented Jan 19, 2026

Uh oh!

MkDev11 commented Feb 24, 2026

Uh oh!

codecov bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yingfeng commented Feb 27, 2026

Uh oh!

MkDev11 commented Mar 23, 2026

Uh oh!

yingfeng commented Mar 24, 2026

Uh oh!

MkDev11 commented Apr 1, 2026

Uh oh!

MkDev11 commented Apr 2, 2026

Uh oh!

yingfeng commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MkDev11 commented Apr 2, 2026

Uh oh!

yingfeng commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MkDev11 commented Jan 19, 2026 •

edited by KevinHuSh

Loading

codecov bot commented Feb 27, 2026 •

edited

Loading

yingfeng commented Apr 2, 2026 •

edited

Loading