Skip to content

Commit c18f85a

Browse files
committed
docs: clarify tool-access boundary in prompt injection section
1 parent 5ecb174 commit c18f85a

File tree

1 file changed

+144
-0
lines changed

1 file changed

+144
-0
lines changed

.github/THREAT_MODEL.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# DocsGPT Public Threat Model
2+
3+
**Classification:** Public
4+
**Last updated:** 2026-04-15
5+
**Applies to:** Open-source and self-hosted DocsGPT deployments
6+
7+
## 1) Overview
8+
9+
DocsGPT ingests content (files/URLs/connectors), indexes it, and answers queries via LLM-backed APIs and optional tools.
10+
11+
Core components:
12+
- Backend API (`application/`)
13+
- Workers/ingestion (`application/worker.py` and related modules)
14+
- Datastores (MongoDB/Redis/vector stores)
15+
- Frontend (`frontend/`)
16+
- Optional extensions/integrations (`extensions/`)
17+
18+
## 2) Scope and assumptions
19+
20+
In scope:
21+
- Application-level threats in this repository.
22+
- Local and internet-exposed self-hosted deployments.
23+
24+
Assumptions:
25+
- Internet-facing instances enable auth and use strong secrets.
26+
- Datastores/internal services are not publicly exposed.
27+
28+
Out of scope:
29+
- Cloud hardware/provider compromise.
30+
- Security guarantees of external LLM vendors.
31+
- Full security audits of third-party systems targeted by tools (external DBs/MCP servers/code-exec APIs).
32+
33+
## 3) Security objectives
34+
35+
- Protect document/conversation confidentiality.
36+
- Preserve integrity of prompts, agents, tools, and indexed data.
37+
- Maintain API/worker availability.
38+
- Enforce tenant isolation in authenticated deployments.
39+
40+
## 4) Assets
41+
42+
- Documents, attachments, chunks/embeddings, summaries.
43+
- Conversations, agents, workflows, prompt templates.
44+
- Secrets (JWT secret, `INTERNAL_KEY`, provider/API/OAuth credentials).
45+
- Operational capacity (worker throughput, queue depth, model quota/cost).
46+
47+
## 5) Trust boundaries and untrusted input
48+
49+
Trust boundaries:
50+
- Internet ↔ Frontend
51+
- Frontend ↔ Backend API
52+
- Backend ↔ Workers/internal APIs
53+
- Backend/workers ↔ Datastores
54+
- Backend ↔ External LLM/connectors/remote URLs
55+
56+
Untrusted input includes API payloads, file uploads, remote URLs, OAuth/webhook data, retrieved content, and LLM/tool arguments.
57+
58+
## 6) Main attack surfaces
59+
60+
1. Auth/authz paths and sharing tokens.
61+
2. File upload + parsing pipeline.
62+
3. Remote URL fetching and connectors (SSRF risk).
63+
4. Agent/tool execution from LLM output.
64+
5. Template/workflow rendering.
65+
6. Frontend rendering + token storage.
66+
7. Internal service endpoints (`INTERNAL_KEY`).
67+
8. High-impact integrations (SQL tool, generic API tool, remote MCP tools).
68+
69+
## 7) Key threats and expected mitigations
70+
71+
### A. Auth/authz misconfiguration
72+
- Threat: weak/no auth or leaked tokens leads to broad data access.
73+
- Mitigations: require auth for public deployments, short-lived tokens, rotation/revocation, least-privilege sharing.
74+
75+
### B. Untrusted file ingestion
76+
- Threat: malicious files/archives trigger traversal, parser exploits, or resource exhaustion.
77+
- Mitigations: strict path checks, archive safeguards, file limits, patched parser dependencies.
78+
79+
### C. SSRF/outbound abuse
80+
- Threat: URL loaders/tools access private/internal/metadata endpoints.
81+
- Mitigations: validate URLs + redirects, block private/link-local ranges, apply egress controls/allowlists.
82+
83+
### D. Prompt injection + tool abuse
84+
- Threat: retrieved text manipulates model behavior and causes unsafe tool calls.
85+
- Threat: never rely on the model to "choose correctly" under adversarial input.
86+
- Mitigations: treat retrieved/model output as untrusted, enforce tool policies, only expose tools explicitly assigned by the user/admin to that agent, separate system instructions from retrieved content, audit tool calls.
87+
88+
### E. Dangerous tool capability chaining (SQL/API/MCP)
89+
- Threat: write-capable SQL credentials allow destructive queries.
90+
- Threat: API tool can trigger side effects (infra/payment/webhook/code-exec endpoints).
91+
- Threat: remote MCP tools may expose privileged operations.
92+
- Mitigations: read-only-by-default credentials, destination allowlists, explicit approval for write/exec actions, per-tool policy enforcement + logging.
93+
94+
### F. Frontend/XSS + token theft
95+
- Threat: XSS can steal local tokens and call APIs.
96+
- Mitigations: reduce unsafe rendering paths, strong CSP, scoped short-lived credentials.
97+
98+
### G. Internal endpoint exposure
99+
- Threat: weak/unset `INTERNAL_KEY` enables internal API abuse.
100+
- Mitigations: fail closed, require strong random keys, keep internal APIs private.
101+
102+
### H. DoS and cost abuse
103+
- Threat: request floods, large ingestion jobs, expensive prompts/crawls.
104+
- Mitigations: rate limits, quotas, timeouts, queue backpressure, usage budgets.
105+
106+
## 8) Example attacker stories
107+
108+
- Internet-exposed deployment runs with weak/no auth and receives unauthorized data access/abuse.
109+
- Intranet deployment intentionally using weak/no auth is vulnerable to insider misuse and lateral-movement abuse.
110+
- Crafted archive attempts path traversal during extraction.
111+
- Malicious URL/redirect chain targets internal services.
112+
- Poisoned document causes data exfiltration through tool calls.
113+
- Over-privileged SQL/API/MCP tool performs destructive side effects.
114+
115+
## 9) Severity calibration
116+
117+
- **Critical:** unauthenticated public data access; prompt-injection-driven exfiltration; SSRF to sensitive internal endpoints.
118+
- **High:** cross-tenant leakage, persistent token compromise, over-privileged destructive tools.
119+
- **Medium:** DoS/cost amplification and non-critical information disclosure.
120+
- **Low:** minor hardening gaps with limited impact.
121+
122+
## 10) Baseline controls for public deployments
123+
124+
1. Enforce authentication and secure defaults.
125+
2. Set/rotate strong secrets (`JWT`, `INTERNAL_KEY`, encryption keys).
126+
3. Restrict CORS and front API with a hardened proxy.
127+
4. Add rate limiting/quotas for answer/upload/crawl/token endpoints.
128+
5. Enforce URL+redirect SSRF protections and egress restrictions.
129+
6. Apply upload/archive/parsing hardening.
130+
7. Require least-privilege tool credentials and auditable tool execution.
131+
8. Monitor auth failures, tool anomalies, ingestion spikes, and cost anomalies.
132+
9. Keep dependencies/images patched and scanned.
133+
10. Validate multi-tenant isolation with explicit tests.
134+
135+
## 11) Maintenance
136+
137+
Review this model after major auth, ingestion, connector, tool, or workflow changes.
138+
139+
## References
140+
141+
- [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
142+
- [OWASP ASVS](https://owasp.org/www-project-application-security-verification-standard/)
143+
- [STRIDE overview](https://learn.microsoft.com/azure/security/develop/threat-modeling-tool-threats)
144+
- [DocsGPT SECURITY.md](../SECURITY.md)

0 commit comments

Comments
 (0)