You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
func (s *Shield) Assess(text, url string) RiskResult
164
168
func (s *Shield) Wrap(text, url string) string
169
+
func (s *Shield) InjectCanary(prompt string) (injectedPrompt, token string, err error)
170
+
func (s *Shield) CheckCanary(response, token string) CanaryResult
165
171
```
166
172
167
173
`Wrap` is useful when you want to preserve data while adding trust-boundary markers before sending content into prompts.
168
174
175
+
`InjectCanary` and `CheckCanary` implement canary token detection for prompt leakage (see below).
176
+
177
+
## Canary Tokens
178
+
179
+
Canary tokens help detect when an LLM may have leaked or echoed hidden prompt content — a potential signal of goal hijacking or prompt extraction, though not definitive proof.
log.Println("canary detected: investigate possible leakage")
197
+
}
198
+
```
199
+
200
+
### How It Works
201
+
202
+
`InjectCanary` appends a unique marker (`<!--CANARY-<16 hex chars>-->`) to your prompt. After the LLM responds, `CheckCanary` scans for that marker. If found, the LLM may have echoed hidden content — worth investigating, though other explanations exist (middleware reflection, model artifacts, etc.).
203
+
204
+
### Limitations
205
+
206
+
Canary tokens are a **best-effort** leak detection signal, not a guarantee:
207
+
208
+
-**Absence does NOT prove safety** — an attacker could instruct the LLM to omit or transform the canary
209
+
-**Some pipelines strip HTML comments** — if your stack sanitizes HTML, the token may be removed before reaching the LLM or before you check the response
0 commit comments