A lightweight, drop-in LLM secret scrubber to prevent API key and credential leaks in your AI applications.
- Drop-in wrapper for OpenAI/httpx - no code rewrite required
- Bidirectional redaction - scrubs secrets before requests and after responses
- 30+ built-in patterns - AWS, GCP, GitHub, Slack, JWT tokens, and more
- Entropy detection - catches high-entropy strings that look like secrets
- Placeholder system - preserves secret functionality while hiding values
- Zero-copy streaming - works with
stream=Trueresponses - CLI tool - scrub logs and files from the command line
pip install scrub-llmfrom scrub_llm import OpenAIScrubber
import openai
# Wrap your OpenAI client
client = openai.OpenAI(api_key="your-key")
scrubbed_client = OpenAIScrubber(client)
# Use normally - secrets are automatically redacted
response = scrubbed_client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": "My AWS key is AKIAIOSFODNN7EXAMPLE" # ← Automatically redacted
}]
)
# Response secrets are also redacted
print(response.choices[0].message.content)
# "Your AWS key <REDACTED_AWS_ACCESS_KEY_ID> has been hidden"from scrub_llm import Scrubber
scrubber = Scrubber()
# Scrub prompts (with placeholder mapping)
text = "My GitHub token is ghp_1234567890abcdefghijklmnopqrstuvwxyz"
clean_text, mappings = scrubber.scrub_prompt(text)
print(clean_text) # "My GitHub token is <SECRET_1>"
# Scrub responses (one-way redaction)
response = "Generated API key: sk-proj-abc123xyz789"
clean_response = scrubber.scrub_response(response)
print(clean_response) # "Generated API key: <REDACTED_OPENAI_API_KEY>"# Check files for secrets
scrub-llm scan file.log
# Scrub secrets from files
scrub-llm scan file.log -o cleaned.log
# Pipe from stdin
cat production.log | scrub-llm scan
# Scan multiple files
scrub-llm scan *.logThe library detects 30+ secret patterns out of the box:
- Cloud Providers: AWS keys, GCP keys, Azure credentials
- Source Control: GitHub, GitLab, Bitbucket tokens
- API Services: OpenAI, Anthropic, Stripe, Twilio, Mailgun keys
- Communication: Slack tokens/webhooks, Discord tokens
- Package Managers: npm, PyPI tokens
- Monitoring: DataDog, New Relic keys
- Authentication: JWTs, OAuth tokens, passwords in URLs
- Encryption: Private keys (RSA, SSH, PGP)
- High Entropy: Any string with high randomness (configurable)
from scrub_llm import Scrubber
from scrub_llm.detectors import RegexDetector
# Add custom patterns
scrubber = Scrubber()
custom_detector = RegexDetector()
custom_detector.patterns["my_pattern"] = re.compile(r"CUSTOM-[A-Z0-9]{16}")
scrubber.add_detector(custom_detector)# Adjust entropy detection sensitivity
scrubber = Scrubber(
enable_entropy=True,
min_entropy=4.0, # Higher = more selective (default: 3.5)
min_entropy_length=25 # Minimum length to check (default: 20)
)# Works seamlessly with streaming
response = scrubbed_client.chat.completions.create(
model="gpt-4",
messages=[...],
stream=True
)
for chunk in response:
if chunk.flagged: # True if secrets detected
print(f"Secrets found: {chunk.secrets}")
print(chunk.safe_text()) # Always safe to displayfrom scrub_llm.transport import ScrubberHTTPXHook
# Create a scrubbed httpx client
hook = ScrubberHTTPXHook()
client = hook.create_client()
# All requests/responses are automatically scrubbed
response = client.post("https://api.example.com", json={
"api_key": "sk-1234567890abcdef" # Automatically redacted
})- Pattern Matching: Detects secrets using regex patterns for known formats
- Entropy Analysis: Identifies high-entropy strings that look like secrets
- Placeholder Mapping: Replaces secrets with placeholders, maintaining a secure mapping
- Streaming Safety: Processes streaming responses chunk-by-chunk
- Bidirectional: Scrubs both outgoing prompts and incoming responses
# Clone the repository
git clone https://github.com/haasonsaas/scrub-llm.git
cd scrub-llm
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check .
mypy .- Placeholders are stored in thread-local storage for safety
- Original secrets never leave your application
- No external API calls or network access required
- All processing happens locally in-memory
- Safe for concurrent/async usage
- Minimal overhead (<1ms for typical prompts)
- Zero-copy streaming responses
- Efficient regex compilation and caching
- Thread-safe for production use
MIT License - This project is released under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- LangChain & LlamaIndex middleware
- Automatic PII detection (names, emails, phone numbers)
- ML-based false positive reduction
- Vault/secrets manager integration
- Rust port for performance-critical paths
- YARA rule support for advanced patterns
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with ❤️ to keep your secrets secret.