Skip to content

fix: add tenant ownership check to document download endpoint (CWE-862)#13834

Open
sebastiondev wants to merge 1 commit intoinfiniflow:mainfrom
sebastiondev:fix/cwe862-doc-idor-f141
Open

fix: add tenant ownership check to document download endpoint (CWE-862)#13834
sebastiondev wants to merge 1 commit intoinfiniflow:mainfrom
sebastiondev:fix/cwe862-doc-idor-f141

Conversation

@sebastiondev
Copy link
Copy Markdown

Vulnerability Summary

CWE-862: Missing Authorization — IDOR on document download endpoint
Severity: High
OWASP Classification: API1:2023 Broken Object Level Authorization (BOLA)

Affected Endpoint

GET /api/v1/documents/<document_id> — the download_doc() function in api/apps/sdk/doc.py

Data Flow

  1. Caller provides a beta API token via the Authorization header
  2. The endpoint validates the token exists (APIToken.query(beta=token)) — authentication ✓
  3. The endpoint looks up the document by document_id (DocumentService.query(id=document_id)) — existence check only
  4. No authorization check verifies that the authenticated caller's tenant owns the requested document
  5. The document file contents are returned from storage via STORAGE_IMPL.get()

Any user holding a valid beta token can download any document belonging to any tenant on the same RAGflow instance by specifying an arbitrary document_id.

How beta Tokens Are Exposed

beta tokens appear in iframe src URLs as ?auth=<token> in shared/embedded chatbot contexts (see useGetDocumentUrl in web/src/hooks/use-document-request.ts). End-users of shared chatbots have access to the token value through browser DevTools.

How Document IDs Are Discoverable

  • Chat responses from the embedded chatbot include document_id fields in chunk reference metadata
  • RAGflow uses UUID1 (time-based) IDs — knowing one ID's timestamp and MAC suffix enables guessing others

Fix Description

File changed: api/apps/sdk/doc.py (3 lines added)
Change: After authenticating the caller and retrieving the document, the fix verifies that the document's parent knowledgebase (doc[0].kb_id) belongs to the caller's tenant (tenant_id) by querying KnowledgebaseService.query(id=doc[0].kb_id, tenant_id=tenant_id). If the check fails, an error is returned before any file content is served.

Rationale

This fix aligns download_doc() with the authorization pattern already used by the sister endpoint download() at /api/v1/datasets/<dataset_id>/documents/<document_id>, which is properly protected with @token_required and KnowledgebaseService.query(id=dataset_id, tenant_id=tenant_id). The KnowledgebaseService is already imported in this file and used throughout for authorization checks.

Diff

@@ -432,12 +432,15 @@ async def download_doc(document_id):
     objs = APIToken.query(beta=token)
     if not objs:
         return get_error_data_result(message='Authentication error: API key is invalid!"')
+    tenant_id = objs[0].tenant_id
 
     if not document_id:
         return get_error_data_result(message="Specify document_id please.")
     doc = DocumentService.query(id=document_id)
     if not doc:
         return get_error_data_result(message=f"The dataset not own the document {document_id}.")
+    if not KnowledgebaseService.query(id=doc[0].kb_id, tenant_id=tenant_id):
+        return get_error_data_result(message="You do not have access to this document.")
     # The process of downloading
     doc_id, doc_location = File2DocumentService.get_storage_address(doc_id=document_id)

Test Results Summary

  • Existing tests pass: The fix does not break any existing functionality. Legitimate document downloads (where the caller's tenant owns the document) continue to work because KnowledgebaseService.query() returns a truthy result for valid tenant-document relationships.
  • Authorization is enforced: Requests with a valid beta token for Tenant A attempting to download a document belonging to Tenant B are now correctly rejected with "You do not have access to this document."
  • No regressions in sister endpoints: The download() endpoint at /datasets/<dataset_id>/documents/<document_id> already has this pattern and is unaffected.

Disprove Analysis Results

We systematically attempted to disprove this finding through 9 checks:

✅ AUTH CHECK

The endpoint does have authentication (validates beta token). However, authentication ≠ authorization. The endpoint authenticates the caller but does not verify the caller owns the requested document. This is a genuine missing authorization issue.

✅ NETWORK CHECK

No localhost-only restrictions. The application is deployed via Docker with ports exposed (SVR_HTTP_PORT:9380). The nginx config serves ^/(v1|api) routes publicly. This endpoint is internet-facing.

✅ DEPLOYMENT CHECK

Docker Compose deployment with nginx reverse proxy. The /api/v1/documents/<document_id> route is accessible through nginx's location ~ ^/(v1|api) rule. No VPN or service mesh constraints found.

✅ CALLER TRACE

  • Route: GET /api/v1/documents/<document_id> — does manual beta token auth (no @token_required decorator)
  • Frontend caller: useGetDocumentUrl generates this URL when auth query param is present (embedded/shared contexts)
  • Attack surface: Any user with a valid beta token can call this endpoint with any document_id

✅ VALIDATION CHECK

No input sanitization or authorization validation before the vulnerable call. The only check is DocumentService.query(id=document_id) which verifies existence, not ownership.

✅ PRIOR REPORTS

No prior reports of this specific IDOR. Related open issues: #6146 "Restricting Download Files" and #6090 "Enhanced Access Control & Security Features".

✅ SECURITY POLICY

SECURITY.md exists but only documents a prior pickle deserialization vulnerability. No mention of IDOR or authorization policies.

✅ RECENT COMMITS

Only one prior commit touched doc.py recently: e705ac6 Add logout (#13796). No prior security fix for this endpoint.

✅ FIX ADEQUACY

Three document download paths exist in the codebase:

  1. GET /api/v1/datasets/<dataset_id>/documents/<document_id> (download()) — already protected with @token_required + KnowledgebaseService.query()
  2. GET /api/v1/documents/<document_id> (download_doc()) — ⬅ this is what the fix addresses
  3. GET /v1/document/get/<doc_id> (document_app.py:get()) — already protected with @login_required

The fix closes the unique path to download arbitrary document files via IDOR.

Exploit Sketch

# Step 1: Obtain a valid beta token from a shared embed URL
# (from browser DevTools inspecting an iframe src)
BETA_TOKEN="a1b2c3d4e5f6789012345678901234"

# Step 2: Obtain document IDs from chat references in chunk metadata
TARGET_DOC_ID="abc123def456789012345678deadbeef"

# Step 3: Download any document from any tenant (BEFORE fix)
curl -H "Authorization: Bearer ${BETA_TOKEN}" \
  "https://ragflow.example.com/api/v1/documents/${TARGET_DOC_ID}" \
  -o stolen_document.pdf

Preconditions

  • Multi-tenant RAGflow deployment (or single tenant with shared embed)
  • Attacker has a valid beta token (obtainable from any shared/embedded chatbot URL)
  • Attacker knows or can enumerate a target document_id (obtainable from chat references or UUID1 guessing)

Existing Mitigations (Insufficient)

  • beta tokens are scoped separately from full API keys, but they still map to a tenant_id
  • Document IDs are UUID1 (not fully random), but require some discovery effort
  • The sister endpoint download() is already properly protected

Related Issues (Not in Scope of This Fix)

The same missing-authorization pattern exists in other beta-authenticated endpoints (conversation_app.py:getsse(), session.py:chatbot_completions(), session.py:chatbots_inputs()), but those leak metadata rather than raw file content and are separate lower-impact issues.


Verdict: CONFIRMED_VALID | Confidence: High
Recommendation: Merge this minimal fix; consider follow-up hardening of other beta-authenticated endpoints.

The /documents/<document_id> endpoint (download_doc) validated that the
API token existed but never verified that the token holder owned the
requested document. Any user with a valid API token could download any
document in the system by providing its document_id.

Add an authorization check that verifies the document belongs to a
dataset owned by the authenticated tenant, consistent with the existing
/datasets/<dataset_id>/documents/<document_id> download endpoint.
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🐖api The modified files are located under directory 'api/apps/sdk' 🐞 bug Something isn't working, pull request that fix bug. labels Mar 27, 2026
@yingfeng yingfeng added the ci Continue Integration label Mar 30, 2026
@yingfeng yingfeng marked this pull request as draft March 30, 2026 07:09
@yingfeng yingfeng marked this pull request as ready for review March 30, 2026 07:09
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.72%. Comparing base (f32a832) to head (5438642).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #13834   +/-   ##
=======================================
  Coverage   96.72%   96.72%           
=======================================
  Files          10       10           
  Lines         702      703    +1     
  Branches      112      112           
=======================================
+ Hits          679      680    +1     
  Misses          5        5           
  Partials       18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐖api The modified files are located under directory 'api/apps/sdk' 🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants