Classification: Public
Last updated: 2026-04-15
Applies to: Open-source and self-hosted DocsGPT deployments
DocsGPT ingests content (files/URLs/connectors), indexes it, and answers queries via LLM-backed APIs and optional tools.
Core components:
- Backend API (
application/) - Workers/ingestion (
application/worker.pyand related modules) - Datastores (MongoDB/Redis/vector stores)
- Frontend (
frontend/) - Optional extensions/integrations (
extensions/)
In scope:
- Application-level threats in this repository.
- Local and internet-exposed self-hosted deployments.
Assumptions:
- Internet-facing instances enable auth and use strong secrets.
- Datastores/internal services are not publicly exposed.
Out of scope:
- Cloud hardware/provider compromise.
- Security guarantees of external LLM vendors.
- Full security audits of third-party systems targeted by tools (external DBs/MCP servers/code-exec APIs).
- Protect document/conversation confidentiality.
- Preserve integrity of prompts, agents, tools, and indexed data.
- Maintain API/worker availability.
- Enforce tenant isolation in authenticated deployments.
- Documents, attachments, chunks/embeddings, summaries.
- Conversations, agents, workflows, prompt templates.
- Secrets (JWT secret,
INTERNAL_KEY, provider/API/OAuth credentials). - Operational capacity (worker throughput, queue depth, model quota/cost).
Trust boundaries:
- Internet ↔ Frontend
- Frontend ↔ Backend API
- Backend ↔ Workers/internal APIs
- Backend/workers ↔ Datastores
- Backend ↔ External LLM/connectors/remote URLs
Untrusted input includes API payloads, file uploads, remote URLs, OAuth/webhook data, retrieved content, and LLM/tool arguments.
- Auth/authz paths and sharing tokens.
- File upload + parsing pipeline.
- Remote URL fetching and connectors (SSRF risk).
- Agent/tool execution from LLM output.
- Template/workflow rendering.
- Frontend rendering + token storage.
- Internal service endpoints (
INTERNAL_KEY). - High-impact integrations (SQL tool, generic API tool, remote MCP tools).
- Threat: weak/no auth or leaked tokens leads to broad data access.
- Mitigations: require auth for public deployments, short-lived tokens, rotation/revocation, least-privilege sharing.
- Threat: malicious files/archives trigger traversal, parser exploits, or resource exhaustion.
- Mitigations: strict path checks, archive safeguards, file limits, patched parser dependencies.
- Threat: URL loaders/tools access private/internal/metadata endpoints.
- Mitigations: validate URLs + redirects, block private/link-local ranges, apply egress controls/allowlists.
- Threat: retrieved text manipulates model behavior and causes unsafe tool calls.
- Threat: never rely on the model to "choose correctly" under adversarial input.
- Mitigations: treat retrieved/model output as untrusted, enforce tool policies, only expose tools explicitly assigned by the user/admin to that agent, separate system instructions from retrieved content, audit tool calls.
- Threat: write-capable SQL credentials allow destructive queries.
- Threat: API tool can trigger side effects (infra/payment/webhook/code-exec endpoints).
- Threat: remote MCP tools may expose privileged operations.
- Mitigations: read-only-by-default credentials, destination allowlists, explicit approval for write/exec actions, per-tool policy enforcement + logging.
- Threat: XSS can steal local tokens and call APIs.
- Mitigations: reduce unsafe rendering paths, strong CSP, scoped short-lived credentials.
- Threat: weak/unset
INTERNAL_KEYenables internal API abuse. - Mitigations: fail closed, require strong random keys, keep internal APIs private.
- Threat: request floods, large ingestion jobs, expensive prompts/crawls.
- Mitigations: rate limits, quotas, timeouts, queue backpressure, usage budgets.
- Internet-exposed deployment runs with weak/no auth and receives unauthorized data access/abuse.
- Intranet deployment intentionally using weak/no auth is vulnerable to insider misuse and lateral-movement abuse.
- Crafted archive attempts path traversal during extraction.
- Malicious URL/redirect chain targets internal services.
- Poisoned document causes data exfiltration through tool calls.
- Over-privileged SQL/API/MCP tool performs destructive side effects.
- Critical: unauthenticated public data access; prompt-injection-driven exfiltration; SSRF to sensitive internal endpoints.
- High: cross-tenant leakage, persistent token compromise, over-privileged destructive tools.
- Medium: DoS/cost amplification and non-critical information disclosure.
- Low: minor hardening gaps with limited impact.
- Enforce authentication and secure defaults.
- Set/rotate strong secrets (
JWT,INTERNAL_KEY, encryption keys). - Restrict CORS and front API with a hardened proxy.
- Add rate limiting/quotas for answer/upload/crawl/token endpoints.
- Enforce URL+redirect SSRF protections and egress restrictions.
- Apply upload/archive/parsing hardening.
- Require least-privilege tool credentials and auditable tool execution.
- Monitor auth failures, tool anomalies, ingestion spikes, and cost anomalies.
- Keep dependencies/images patched and scanned.
- Validate multi-tenant isolation with explicit tests.
Review this model after major auth, ingestion, connector, tool, or workflow changes.