Skip to content

Commit f5ff166

Browse files
committed
docs: document security block config options and injection mechanism
Expand documentation for disable_policy, disable_suffix, and strict_policy across config.example.toml, SecurityConfig doc comments, and the RFC. Clarifies the two-layer security model, how disabling each layer works independently, and warns about disabling both. Updates RFC to reflect concatenation into last message (not separate user message).
1 parent f47f5b8 commit f5ff166

3 files changed

Lines changed: 85 additions & 24 deletions

File tree

config.example.toml

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,13 +202,38 @@ bind = "127.0.0.1"
202202
# api_token = "${TELEGRAM_BOT_TOKEN}"
203203

204204
[security]
205+
# LocalGPT injects a security block at the end of every LLM context window.
206+
# The block has two independent layers:
207+
#
208+
# 1. User policy (LocalGPT.md) — your custom instructions, cryptographically
209+
# signed and verified. Injected as "## Workspace Security Policy" header
210+
# followed by your content. Requires `localgpt md sign` after editing.
211+
#
212+
# 2. Hardcoded suffix — a compiled-in security reminder that tells the model
213+
# to treat tool outputs and retrieved content as data, not instructions.
214+
# This is the last thing the model sees before generating a response.
215+
#
216+
# Both are concatenated into the last user message on every API call.
217+
# They are NOT saved to session logs or included in compaction.
218+
#
219+
# You can disable each layer independently:
220+
205221
# Abort on tamper or suspicious content in LocalGPT.md (default: false)
222+
# When true, TamperDetected and SuspiciousContent are fatal errors that
223+
# prevent the agent from starting. When false, the agent warns and falls
224+
# back to hardcoded suffix only.
206225
# strict_policy = false
207226

208227
# Skip loading the LocalGPT.md workspace security policy (default: false)
228+
# The hardcoded suffix still applies unless also disabled.
209229
# disable_policy = false
210230

211-
# Skip the hardcoded security suffix injected at end of context (default: false)
231+
# Skip the hardcoded security suffix (default: false)
232+
# The user policy still applies unless also disabled.
233+
# WARNING: Disabling both disable_policy and disable_suffix removes all
234+
# end-of-context security reinforcement. The system prompt safety section
235+
# still exists but may lose effectiveness in long sessions due to the
236+
# "lost in the middle" attention decay.
212237
# disable_suffix = false
213238

214239
[logging]

crates/core/src/config/mod.rs

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -191,19 +191,33 @@ pub struct PerplexityConfig {
191191

192192
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
193193
pub struct SecurityConfig {
194-
/// Abort agent startup on tamper or suspicious content (default: false)
194+
/// Abort agent startup on tamper or suspicious content (default: false).
195195
///
196196
/// When true, `TamperDetected` and `SuspiciousContent` are fatal errors
197197
/// that prevent the agent from starting. When false (default), the agent
198198
/// warns and falls back to hardcoded-only security.
199199
#[serde(default)]
200200
pub strict_policy: bool,
201201

202-
/// Skip loading and injecting the LocalGPT.md security policy (default: false)
202+
/// Skip loading and injecting the `LocalGPT.md` workspace security policy
203+
/// (default: false).
204+
///
205+
/// When true, the user's signed `LocalGPT.md` content is not loaded or
206+
/// injected into the context window. The hardcoded security suffix still
207+
/// applies unless [`disable_suffix`] is also set.
203208
#[serde(default)]
204209
pub disable_policy: bool,
205210

206-
/// Skip injecting the hardcoded security suffix (default: false)
211+
/// Skip injecting the hardcoded security suffix (default: false).
212+
///
213+
/// The suffix is a compiled-in reminder that tells the model to treat
214+
/// tool outputs and retrieved content as data, not instructions. When
215+
/// disabled, the user policy (if any) still applies.
216+
///
217+
/// **Warning:** Setting both `disable_policy` and `disable_suffix` to
218+
/// `true` removes all end-of-context security reinforcement. The system
219+
/// prompt safety section still exists, but may lose effectiveness in
220+
/// long sessions due to attention decay ("lost in the middle" effect).
207221
#[serde(default)]
208222
pub disable_suffix: bool,
209223

docs/RFC-LocalGPT-Security-Policy.md

Lines changed: 42 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -435,15 +435,29 @@ The security block is **injected on every API call**, not stored in conversation
435435

436436
**Message array structure per API call:**
437437

438+
The security block is **concatenated into the last user or tool-result message** (separated by `\n\n`) rather than appended as a separate user message. This avoids consecutive same-role messages, which violates the Anthropic Messages API protocol.
439+
438440
```
439-
Turn 1: [system_prompt] [user_1] [security_block] → generate
440-
Turn 3: [system_prompt] [user_1] [asst_1] [user_2] [asst_2] [user_3] [security_block] → generate
441-
Turn N: [system_prompt] [msg_1 ... msg_2N-1] [user_N] [security_block] → generate
442-
↑ always last
441+
Turn 1: [system_prompt] [user_1 + security_block] → generate
442+
Turn 3: [system_prompt] [user_1] [asst_1] [user_2] [asst_2] [user_3 + security_block] → generate
443+
Tool: [system_prompt] [...] [asst_N (tool_call)] [tool_result + security_block] → generate
444+
↑ always last in context
443445
```
444446

447+
If the last message is neither User nor Tool (edge case: Assistant or System), the security block falls back to a separate User message.
448+
445449
The security block is a synthetic injection. It is not persisted in `session.rs` conversation history, not included in session compaction/summarization, and not visible to the user.
446450

451+
Both layers can be independently disabled via `config.toml`:
452+
453+
```toml
454+
[security]
455+
disable_policy = true # skip LocalGPT.md content
456+
disable_suffix = true # skip hardcoded reminder
457+
```
458+
459+
When both are disabled, `build_ending_security_block()` returns an empty string and no injection occurs.
460+
447461
**Token budget:** Hardcoded suffix ≈80 tokens (always included, non-negotiable). User policy ≤1000 tokens. Total ≈1080 tokens per turn. Over a 30-turn session this consumes ~32K tokens — about 27% of `reserve_tokens` (8000) should be allocated for the security block to prevent it from being dropped during context window management.
448462

449463
**Reserve token accounting:** Update `reserve_tokens` calculation in `session.rs` to include `SECURITY_BLOCK_RESERVE`:
@@ -514,32 +528,40 @@ self.verified_security_policy = match &policy {
514528
};
515529
```
516530

517-
**At every API call** (`build_messages_for_api_call()`) — inject fresh:
531+
**At every API call** (`messages_for_api_call()`) — inject fresh:
518532

519533
```rust
520-
fn build_messages_for_api_call(&self) -> Vec<Message> {
521-
let mut messages = Vec::new();
534+
fn messages_for_api_call(&self) -> Vec<Message> {
535+
let mut messages = self.session.messages_for_llm();
522536

523-
// System prompt (primacy position — includes safety preamble)
524-
messages.push(Message::system(&self.system_prompt));
537+
let include_suffix = !self.app_config.security.disable_suffix;
538+
let policy = if self.app_config.security.disable_policy {
539+
None
540+
} else {
541+
self.verified_security_policy.as_deref()
542+
};
525543

526-
// Conversation history (stored turns only, no synthetic messages)
527-
for turn in &self.conversation_history {
528-
messages.push(turn.clone());
544+
let security_block = build_ending_security_block(policy, include_suffix);
545+
546+
if !security_block.is_empty() {
547+
// Concatenate into the last User or Tool message to avoid
548+
// consecutive same-role messages (Anthropic API requirement).
549+
if let Some(last) = messages.last_mut()
550+
&& matches!(last.role, Role::User | Role::Tool)
551+
{
552+
last.content.push_str("\n\n");
553+
last.content.push_str(&security_block);
554+
} else {
555+
// Fallback: no messages or last message is Assistant/System
556+
messages.push(Message::user(&security_block));
557+
}
529558
}
530559

531-
// Security block (recency position — always last before generation)
532-
// Appended to the last user message or as a trailing user message
533-
let security = build_ending_security_block(
534-
self.verified_security_policy.as_deref(),
535-
);
536-
messages.push(Message::user(&security));
537-
538560
messages
539561
}
540562
```
541563

542-
**Note:** The security block is injected as a `user`-role message (or appended to the final user message content) because some providers do not support interleaved `system` messages, and user-role content at the end receives strong recency attention regardless of role.
564+
**Note:** The security block is concatenated into the last user/tool message content rather than appended as a separate user message. This avoids consecutive same-role messages (which violates the Anthropic Messages API protocol) while keeping the security text in the recency position.
543565

544566
---
545567

0 commit comments

Comments
 (0)