Privacy-first text redaction using local LLM models. Redact sensitive information like names, emails, phone numbers, and more without sending data to external services.
graph TB
subgraph "User Interfaces"
CLI["🖥️ CLI Tool<br/>(llm-redact)"]
PY["🐍 Python API<br/>(import llm_redact)"]
WEB["🌐 Web Dashboard<br/>(localhost:3000)"]
end
subgraph "Core Package"
PYPI["📦 PyPI Package<br/>(llm-redact)<br/>Port: N/A"]
end
subgraph "Backend Services"
HOST["🤖 LLM Host<br/>(FastAPI + Ollama)<br/>Port: 8000"]
API["🔌 Web API Backend<br/>(FastAPI)<br/>Port: 8001"]
end
subgraph "Data Layer"
DB["💾 SQLite Database<br/>(llm_redact.db)"]
OLLAMA["🧠 Ollama Service<br/>(Local LLM Models)<br/>Port: 11434"]
end
CLI --> PYPI
PY --> PYPI
WEB --> API
PYPI --> HOST
API --> PYPI
HOST --> OLLAMA
HOST --> DB
PYPI --> DB
API --> DB
style CLI fill:#e1f5fe
style PY fill:#e1f5fe
style WEB fill:#e1f5fe
style PYPI fill:#f3e5f5
style HOST fill:#fff3e0
style API fill:#fff3e0
style DB fill:#e8f5e8
style OLLAMA fill:#e8f5e8
-
User Interfaces (Blue): Three ways to interact with LLM Redact
- CLI tool for command-line usage
- Python API for programmatic access
- Web dashboard for visual management
-
Core Package (Purple): The main
llm-redactPyPI package- Handles all redaction logic and database operations
- Used by CLI, Python API, and Web API backend
-
Backend Services (Orange): Two FastAPI servers
- LLM Host (port 8000): Processes redaction requests using Ollama
- Web API Backend (port 8001): REST API for the web dashboard
-
Data Layer (Green): Storage and AI services
- SQLite database for caching, history, and rule storage
- Ollama service (port 11434) for local LLM model inference
- AI-Powered - Uses local LLM models instead of regex patterns for more flexible and accurate redaction
- Consistent Hashing - Same data always masked identically (e.g.
|_PHONE_NUMBER_HASH_|) for analysis and LLM performance - Fully Customizable - Create custom rules for any data type
- Privacy-First - All processing happens locally, no data sent externally
- Cost-Effective - Uses small, efficient models that don't impact API performance
- 🔒 Privacy-first - Local LLM processing, no external data transmission
- 🚀 Simple API - One-liner redaction:
llm_redact.mask(text) - 💾 Smart Caching - SQLite database for performance and history
- 🔧 Configurable - Custom rules, models, and database connections
- 📊 Full Tracking - Complete history and analytics
- 🎨 Web Dashboard - Modern UI for rule management and monitoring
- 🔄 Rule Generation - AI-powered custom rule creation
pip install llm-redactfrom llm_redact.client import LLMRedactClient
client = LLMRedactClient()
result = client.mask("Patient John Smith DOB 1985-03-15 is prescribed Metformin 500mg")
print(result.redacted_text)
# Output: Patient |_PERSON_NAME_EC1AC193_| DOB |_DATE_OF_BIRTH_9757F43A_| is prescribed |_MEDICATION_NAME_554C2909_| |_MEDICATION_DOSE_4FC6DDE1_|- Python 3.12.4+
- Ollama for local LLM hosting
# Install and start Ollama
ollama serve
# Pull a small, efficient model (recommended)
ollama pull gemma3:4b
# Clone and setup LLM host
cd src/llm_host
poetry install
poetry run python main.py # Starts on http://localhost:8000cd src/web_ui/backend
poetry install
poetry run python main.py # Starts on http://localhost:8001cd src/web_ui/frontend
npm install
npm run dev # Starts on http://localhost:3000# LLM Host settings
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b
# Database settings
export LLM_REDACT_DATABASE_URL=sqlite:///llm_redact.db
# Web UI settings (if using)
export LLM_REDACT_WEB_LLM_HOST_URL=http://localhost:8000
export LLM_REDACT_WEB_DATABASE_URL=sqlite:///llm_redact.db# Basic redaction
llm-redact "John Smith's email is [email protected]"
# Custom LLM host
llm-redact "Sensitive text" --host http://localhost:8000
# Generate and use custom rules
llm-redact "Medical data here" --generate-rules "help me filter medical sensitive data"import llm_redact
# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from [email protected]")
# Custom rules
from llm_redact import RedactionRule
custom_rules = [
RedactionRule(
name="Replace SSN with [SSN]",
description="Social Security Numbers",
data_type="SSN"
)
]
result = llm_redact.mask("SSN: 123-45-6789", rules=custom_rules)
# Advanced client usage
from llm_redact import LLMRedactClient
client = LLMRedactClient(
llm_host_url="http://localhost:8000",
database_url="postgresql://user:pass@localhost/redact_db"
)
# Generate rules with AI
rules = client.generate_rules("help me filter medical sensitive data")
result = client.mask(text, rules=rules)
# View history
history = client.get_history(limit=50)- Personal names →
|_NAME_XXXX_| - Email addresses →
|_EMAIL_XXXX_| - Phone numbers →
|_PHONE_XXXX_| - SSN/Tax IDs →
|_SSN_XXXX_| - Credit cards →
|_CREDIT_CARD_XXXX_| - Addresses →
|_ADDRESS_XXXX_| - Medical data →
|_MEDICATION_XXXX_| - And more...
XXXX represents a unique 8-character hash for consistent masking
This project consists of three main components:
- Core redaction client and database management
- Published as
llm-redactpackage - Supports custom rules and caching
- FastAPI server using Ollama for local LLM inference
- Handles redaction requests and rule generation
- Runs on port 8000 by default
- Backend: FastAPI REST API (port 8001)
- Frontend: Next.js dashboard (port 3000)
- Rule management, history viewing, and analytics
| Variable | Default | Description |
|---|---|---|
LLM_REDACT_LLM_HOST_URL |
http://localhost:8000 |
LLM host service URL |
LLM_REDACT_DATABASE_URL |
sqlite:///llm_redact.db |
Database connection string |
LLM_REDACT_DEFAULT_MODEL |
gemma3:1b |
Default LLM model |
LLM_REDACT_ENABLE_CACHING |
true |
Enable result caching |
- SQLite (default) - Perfect for development and small deployments
- PostgreSQL - Recommended for production
- MySQL - Also supported via SQLAlchemy
# Clone repository
git clone https://github.com/lookr-fyi/llm-redact.git
cd llm-redact
# Install dependencies for all components
cd src/pypi && poetry install
cd ../llm_host && poetry install
cd ../web_ui/backend && poetry install
cd ../frontend && npm installcd src/pypi
poetry run pytestcd src/pypi
chmod +x publish.sh
./publish.shllm_redact.mask(text, rules=None, model=None)- Simple redactionLLMRedactClient.mask()- Advanced redaction with optionsLLMRedactClient.generate_rules()- AI rule generationLLMRedactClient.get_history()- View redaction history
POST /redact- Redact textPOST /rule/generate- Generate rulesGET /health- Health checkGET /models- List available models
Full API documentation available at http://localhost:8000/docs and http://localhost:8001/docs
See src/pypi/examples/ for complete examples:
mask_demo.py- Basic redactionrule_generation_demo.py- AI rule generationrule_group_demo.py- Rule managementhistory_demo.py- History viewing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and add tests
- Run tests:
poetry run pytest - Submit a pull request
- Follow PEP 8 style guidelines
- Add tests for new features
- Update documentation
- Use Poetry for dependency management
MIT License - see LICENSE file for details.
- Homepage: https://github.com/lookr-fyi/llm-redact
- Issues: https://github.com/lookr-fyi/llm-redact/issues
- Documentation: https://github.com/lookr-fyi/llm-redact/blob/main/README.md
- Create an issue for bug reports or feature requests
- Check existing issues before creating new ones
- Contribute improvements via pull requests
Keywords: privacy, redaction, llm, pii, data-protection, sensitive-data, ai, local-llm, gdpr, hipaa
