LLM Redact

Privacy-first text redaction using local LLM models. Redact sensitive information like names, emails, phone numbers, and more without sending data to external services.

Architecture

graph TB
    subgraph "User Interfaces"
        CLI["🖥️ CLI Tool<br/>(llm-redact)"]
        PY["🐍 Python API<br/>(import llm_redact)"]
        WEB["🌐 Web Dashboard<br/>(localhost:3000)"]
    end
    
    subgraph "Core Package"
        PYPI["📦 PyPI Package<br/>(llm-redact)<br/>Port: N/A"]
    end
    
    subgraph "Backend Services"
        HOST["🤖 LLM Host<br/>(FastAPI + Ollama)<br/>Port: 8000"]
        API["🔌 Web API Backend<br/>(FastAPI)<br/>Port: 8001"]
    end
    
    subgraph "Data Layer"
        DB["💾 SQLite Database<br/>(llm_redact.db)"]
        OLLAMA["🧠 Ollama Service<br/>(Local LLM Models)<br/>Port: 11434"]
    end
    
    CLI --> PYPI
    PY --> PYPI
    WEB --> API
    
    PYPI --> HOST
    API --> PYPI
    
    HOST --> OLLAMA
    HOST --> DB
    PYPI --> DB
    API --> DB
    
    style CLI fill:#e1f5fe
    style PY fill:#e1f5fe
    style WEB fill:#e1f5fe
    style PYPI fill:#f3e5f5
    style HOST fill:#fff3e0
    style API fill:#fff3e0
    style DB fill:#e8f5e8
    style OLLAMA fill:#e8f5e8

Component Relationships

User Interfaces (Blue): Three ways to interact with LLM Redact
- CLI tool for command-line usage
- Python API for programmatic access
- Web dashboard for visual management
Core Package (Purple): The main llm-redact PyPI package
- Handles all redaction logic and database operations
- Used by CLI, Python API, and Web API backend
Backend Services (Orange): Two FastAPI servers
- LLM Host (port 8000): Processes redaction requests using Ollama
- Web API Backend (port 8001): REST API for the web dashboard
Data Layer (Green): Storage and AI services
- SQLite database for caching, history, and rule storage
- Ollama service (port 11434) for local LLM model inference

Why LLM Redact?

AI-Powered - Uses local LLM models instead of regex patterns for more flexible and accurate redaction
Consistent Hashing - Same data always masked identically (e.g. |_PHONE_NUMBER_HASH_|) for analysis and LLM performance
Fully Customizable - Create custom rules for any data type
Privacy-First - All processing happens locally, no data sent externally
Cost-Effective - Uses small, efficient models that don't impact API performance

Features

🔒 Privacy-first - Local LLM processing, no external data transmission
🚀 Simple API - One-liner redaction: llm_redact.mask(text)
💾 Smart Caching - SQLite database for performance and history
🔧 Configurable - Custom rules, models, and database connections
📊 Full Tracking - Complete history and analytics
🎨 Web Dashboard - Modern UI for rule management and monitoring
🔄 Rule Generation - AI-powered custom rule creation

Quick Start

Installation

pip install llm-redact

Basic Usage

from llm_redact.client import LLMRedactClient

client = LLMRedactClient()
result = client.mask("Patient John Smith DOB 1985-03-15 is prescribed Metformin 500mg")
print(result.redacted_text)
# Output: Patient |_PERSON_NAME_EC1AC193_| DOB |_DATE_OF_BIRTH_9757F43A_| is prescribed |_MEDICATION_NAME_554C2909_| |_MEDICATION_DOSE_4FC6DDE1_|

Setup

Prerequisites

Python 3.12.4+
Ollama for local LLM hosting

1. Setup LLM Host

# Install and start Ollama
ollama serve

# Pull a small, efficient model (recommended)
ollama pull gemma3:4b

# Clone and setup LLM host
cd src/llm_host
poetry install
poetry run python main.py  # Starts on http://localhost:8000

2. Setup Web UI (Optional)

Backend

cd src/web_ui/backend
poetry install
poetry run python main.py  # Starts on http://localhost:8001

Frontend

cd src/web_ui/frontend
npm install
npm run dev  # Starts on http://localhost:3000

3. Environment Configuration

# LLM Host settings
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b

# Database settings
export LLM_REDACT_DATABASE_URL=sqlite:///llm_redact.db

# Web UI settings (if using)
export LLM_REDACT_WEB_LLM_HOST_URL=http://localhost:8000
export LLM_REDACT_WEB_DATABASE_URL=sqlite:///llm_redact.db

Usage

CLI

# Basic redaction
llm-redact "John Smith's email is [email protected]"

# Custom LLM host
llm-redact "Sensitive text" --host http://localhost:8000

# Generate and use custom rules
llm-redact "Medical data here" --generate-rules "help me filter medical sensitive data"

Python API

import llm_redact

# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from [email protected]")

# Custom rules
from llm_redact import RedactionRule

custom_rules = [
    RedactionRule(
        name="Replace SSN with [SSN]",
        description="Social Security Numbers", 
        data_type="SSN"
    )
]

result = llm_redact.mask("SSN: 123-45-6789", rules=custom_rules)

# Advanced client usage
from llm_redact import LLMRedactClient

client = LLMRedactClient(
    llm_host_url="http://localhost:8000",
    database_url="postgresql://user:pass@localhost/redact_db"
)

# Generate rules with AI
rules = client.generate_rules("help me filter medical sensitive data")
result = client.mask(text, rules=rules)

# View history
history = client.get_history(limit=50)

Supported Data Types

Personal names → |_NAME_XXXX_|
Email addresses → |_EMAIL_XXXX_|
Phone numbers → |_PHONE_XXXX_|
SSN/Tax IDs → |_SSN_XXXX_|
Credit cards → |_CREDIT_CARD_XXXX_|
Addresses → |_ADDRESS_XXXX_|
Medical data → |_MEDICATION_XXXX_|
And more...

XXXX represents a unique 8-character hash for consistent masking

Components

This project consists of three main components:

1. PyPI Library (`src/pypi/`)

Core redaction client and database management
Published as llm-redact package
Supports custom rules and caching

2. LLM Host (`src/llm_host/`)

FastAPI server using Ollama for local LLM inference
Handles redaction requests and rule generation
Runs on port 8000 by default

3. Web UI (`src/web_ui/`)

Backend: FastAPI REST API (port 8001)
Frontend: Next.js dashboard (port 3000)
Rule management, history viewing, and analytics

Configuration

Environment Variables

Variable	Default	Description
`LLM_REDACT_LLM_HOST_URL`	`http://localhost:8000`	LLM host service URL
`LLM_REDACT_DATABASE_URL`	`sqlite:///llm_redact.db`	Database connection string
`LLM_REDACT_DEFAULT_MODEL`	`gemma3:1b`	Default LLM model
`LLM_REDACT_ENABLE_CACHING`	`true`	Enable result caching

Database Support

SQLite (default) - Perfect for development and small deployments
PostgreSQL - Recommended for production
MySQL - Also supported via SQLAlchemy

Development

Setup Development Environment

# Clone repository
git clone https://github.com/lookr-fyi/llm-redact.git
cd llm-redact

# Install dependencies for all components
cd src/pypi && poetry install
cd ../llm_host && poetry install  
cd ../web_ui/backend && poetry install
cd ../frontend && npm install

Running Tests

cd src/pypi
poetry run pytest

Publishing

cd src/pypi
chmod +x publish.sh
./publish.sh

API Reference

Core Functions

llm_redact.mask(text, rules=None, model=None) - Simple redaction
LLMRedactClient.mask() - Advanced redaction with options
LLMRedactClient.generate_rules() - AI rule generation
LLMRedactClient.get_history() - View redaction history

REST API Endpoints

POST /redact - Redact text
POST /rule/generate - Generate rules
GET /health - Health check
GET /models - List available models

Full API documentation available at http://localhost:8000/docs and http://localhost:8001/docs

Examples

See src/pypi/examples/ for complete examples:

mask_demo.py - Basic redaction
rule_generation_demo.py - AI rule generation
rule_group_demo.py - Rule management
history_demo.py - History viewing

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and add tests
Run tests: poetry run pytest
Submit a pull request

Development Guidelines

Follow PEP 8 style guidelines
Add tests for new features
Update documentation
Use Poetry for dependency management

License

MIT License - see LICENSE file for details.

Repository

Homepage: https://github.com/lookr-fyi/llm-redact
Issues: https://github.com/lookr-fyi/llm-redact/issues
Documentation: https://github.com/lookr-fyi/llm-redact/blob/main/README.md

Support

Create an issue for bug reports or feature requests
Check existing issues before creating new ones
Contribute improvements via pull requests

Keywords: privacy, redaction, llm, pii, data-protection, sensitive-data, ai, local-llm, gdpr, hipaa

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
docs		docs
img		img
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

License

lookr-fyi/llm-redact

Folders and files

Latest commit

History

Repository files navigation

LLM Redact

Architecture

Component Relationships

Why LLM Redact?

Features

Quick Start

Installation

Basic Usage

Setup

Prerequisites

1. Setup LLM Host

2. Setup Web UI (Optional)

Backend

Frontend

3. Environment Configuration

Usage

CLI

Python API

Supported Data Types

Components

1. PyPI Library (src/pypi/)

2. LLM Host (src/llm_host/)

3. Web UI (src/web_ui/)

Configuration

Environment Variables

Database Support

Development

Setup Development Environment

Running Tests

Publishing

API Reference

Core Functions

REST API Endpoints

Examples

Contributing

Development Guidelines

License

Repository

Support

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. PyPI Library (`src/pypi/`)

2. LLM Host (`src/llm_host/`)

3. Web UI (`src/web_ui/`)

Packages