feat(langchain): SecretMiddleware for tool-call credential detection#37192
Draft
Bagatur (baskaryan) wants to merge 2 commits into
Draft
feat(langchain): SecretMiddleware for tool-call credential detection#37192Bagatur (baskaryan) wants to merge 2 commits into
SecretMiddleware for tool-call credential detection#37192Bagatur (baskaryan) wants to merge 2 commits into
Conversation
…ection Introduces a tool-call middleware that detects known credential patterns (GitHub tokens classic + fine-grained, LangSmith keys, Anthropic keys, OpenAI legacy + project keys, AWS Access Key IDs, JWTs) in tool-call arguments and either blocks or redacts them. Use case: an agent reading attacker-controllable input — system prompts, retrieved documents, tool-result content — can be steered into emitting tool calls that embed credentials it has read from elsewhere, exfiltrating them through the legitimate tool surface. The egress allowlist on the agent's environment can't help here because the destination host is exactly the one you have to allow for legitimate use. A chokepoint in front of tool execution that inspects args closes that gap. Mirrors the conventions of the existing PIIMiddleware where they apply (strategy literal, custom detector hook, public detector function for direct use), but focuses on tool-call args rather than message content. Two strategies: - "block" (default): return ToolMessage(status="error") instead of executing the tool. The matched substring is intentionally not echoed back, which would re-publish the secret into the agent's context. - "redact": substitute matches with [REDACTED_<SECRET_TYPE>] in the args and execute the tool with the rewritten args. Each detector pattern anchors on a fixed, high-entropy prefix + length + alphabet. The tests' negative corpus (commit SHAs, UUIDs, ULIDs, base64 manifests, "sk-" in prose) confirms the FP rate stays at zero without word-boundary anchors, which is important because boundaries would block matches when secrets are concatenated with alphanumeric characters in JSON the agent built up by interpolation. Public surface: SecretMiddleware, SecretMatch, find_secrets, BUILTIN_SECRET_TYPES. Direct hook tests + create_agent end-to-end tests + sync/async coverage = 51 passing tests.
Merging this PR will not alter performance
Comparing Footnotes
|
`ToolCallRequest.override(tool_call=...)` is typed to accept `ToolCall` (a TypedDict from langchain_core.messages). The runtime shape is identical to the dict-spread we were producing — adding a cast keeps the call flagged for the type checker without any behavioural change.
SecretMiddleware for tool-call credential detection
Collaborator
Eugene Yurtsev (eyurtsev)
left a comment
There was a problem hiding this comment.
Have you seeing the PIIMiddleware interface?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a tool-call middleware (
SecretMiddleware) that detects known credential patterns in tool-call arguments and either blocks the call or redacts the matched substring before the tool runs.Motivation
Agents that read attacker-controllable input — system prompts, retrieved documents, tool-result content from upstream tools — can be steered (via prompt injection) into emitting tool calls that embed credentials they read from elsewhere, exfiltrating them through the legitimate tool surface. The egress allowlist on the agent's environment can't help here because the destination host is exactly the one you have to allow for legitimate use.
A chokepoint in front of tool execution that inspects args closes that gap. It's the natural complement to
PIIMiddleware(which scans messages for PII) — different threat surface (tool args), different patterns (credentials), but the same shape of solution.API
Other constructor args:
tools=[...]— limit to specific tools (mirrorsToolRetryMiddleware).secret_types=[...]— limit to specific built-in detectors. SeeBUILTIN_SECRET_TYPESfor the full set:github_classic_token,github_fine_grained_pat,langsmith_key,anthropic_key,openai_project_key,openai_legacy_key,aws_access_key_id,jwt.custom_detectors={"name": finder}— register additional detectors. Each finder takes a string and yields(start, end)byte-offset pairs.Public surface:
SecretMiddleware,SecretMatch,find_secrets,BUILTIN_SECRET_TYPES.Design notes
No
\bword-boundary anchors on the patterns. Each detector relies on prefix + length + alphabet to be tight on its own. Boundaries would block matches when secrets are concatenated with alphanumeric characters in JSON the agent built up by interpolation — exactly the case attackers exploit. The negative corpus intest_find_secrets_negative_corpus(commit SHAs, UUIDs, ULIDs,{"lc": 1, ...}manifest snippets, prose like"the docs mention sk- as a prefix") confirms FP rate stays at zero without them.Block strategy never echoes the matched substring back. Returning the match in the
ToolMessagecontent would re-publish the secret into the agent's context window, defeating the purpose. The rejection mentions only the type (e.g.github_classic_token) and the tool name.Redact uses
request.overrideto produce a newToolCallRequestwith rewritten args, leaving the original immutable.Tests
51 tests covering: each built-in pattern detected, negative corpus, nested dict walking with path tracking, offset reporting, block strategy, redact strategy (incl. nested + overlapping spans),
toolsfilter,secret_typesfilter, custom detectors with and without built-ins, async wrap, missing args dict, and end-to-endcreate_agentintegration for both strategies.Lint clean (
uv run --group lint ruff check), formatted (uv run --group lint ruff format).