Skip to content

lookr-fyi/llm-redact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Redact

Privacy-first text redaction using local LLM models. Redact sensitive information like names, emails, phone numbers, and more without sending data to external services.

Architecture

graph TB
    subgraph "User Interfaces"
        CLI["🖥️ CLI Tool<br/>(llm-redact)"]
        PY["🐍 Python API<br/>(import llm_redact)"]
        WEB["🌐 Web Dashboard<br/>(localhost:3000)"]
    end
    
    subgraph "Core Package"
        PYPI["📦 PyPI Package<br/>(llm-redact)<br/>Port: N/A"]
    end
    
    subgraph "Backend Services"
        HOST["🤖 LLM Host<br/>(FastAPI + Ollama)<br/>Port: 8000"]
        API["🔌 Web API Backend<br/>(FastAPI)<br/>Port: 8001"]
    end
    
    subgraph "Data Layer"
        DB["💾 SQLite Database<br/>(llm_redact.db)"]
        OLLAMA["🧠 Ollama Service<br/>(Local LLM Models)<br/>Port: 11434"]
    end
    
    CLI --> PYPI
    PY --> PYPI
    WEB --> API
    
    PYPI --> HOST
    API --> PYPI
    
    HOST --> OLLAMA
    HOST --> DB
    PYPI --> DB
    API --> DB
    
    style CLI fill:#e1f5fe
    style PY fill:#e1f5fe
    style WEB fill:#e1f5fe
    style PYPI fill:#f3e5f5
    style HOST fill:#fff3e0
    style API fill:#fff3e0
    style DB fill:#e8f5e8
    style OLLAMA fill:#e8f5e8
Loading

Component Relationships

  • User Interfaces (Blue): Three ways to interact with LLM Redact

    • CLI tool for command-line usage
    • Python API for programmatic access
    • Web dashboard for visual management
  • Core Package (Purple): The main llm-redact PyPI package

    • Handles all redaction logic and database operations
    • Used by CLI, Python API, and Web API backend
  • Backend Services (Orange): Two FastAPI servers

    • LLM Host (port 8000): Processes redaction requests using Ollama
    • Web API Backend (port 8001): REST API for the web dashboard
  • Data Layer (Green): Storage and AI services

    • SQLite database for caching, history, and rule storage
    • Ollama service (port 11434) for local LLM model inference

Why LLM Redact?

  1. AI-Powered - Uses local LLM models instead of regex patterns for more flexible and accurate redaction
  2. Consistent Hashing - Same data always masked identically (e.g. |_PHONE_NUMBER_HASH_|) for analysis and LLM performance
  3. Fully Customizable - Create custom rules for any data type
  4. Privacy-First - All processing happens locally, no data sent externally
  5. Cost-Effective - Uses small, efficient models that don't impact API performance

Dashboard Preview

Features

  • 🔒 Privacy-first - Local LLM processing, no external data transmission
  • 🚀 Simple API - One-liner redaction: llm_redact.mask(text)
  • 💾 Smart Caching - SQLite database for performance and history
  • 🔧 Configurable - Custom rules, models, and database connections
  • 📊 Full Tracking - Complete history and analytics
  • 🎨 Web Dashboard - Modern UI for rule management and monitoring
  • 🔄 Rule Generation - AI-powered custom rule creation

Quick Start

Installation

pip install llm-redact

Basic Usage

from llm_redact.client import LLMRedactClient

client = LLMRedactClient()
result = client.mask("Patient John Smith DOB 1985-03-15 is prescribed Metformin 500mg")
print(result.redacted_text)
# Output: Patient |_PERSON_NAME_EC1AC193_| DOB |_DATE_OF_BIRTH_9757F43A_| is prescribed |_MEDICATION_NAME_554C2909_| |_MEDICATION_DOSE_4FC6DDE1_|

Setup

Prerequisites

  • Python 3.12.4+
  • Ollama for local LLM hosting

1. Setup LLM Host

# Install and start Ollama
ollama serve

# Pull a small, efficient model (recommended)
ollama pull gemma3:4b

# Clone and setup LLM host
cd src/llm_host
poetry install
poetry run python main.py  # Starts on http://localhost:8000

2. Setup Web UI (Optional)

Backend

cd src/web_ui/backend
poetry install
poetry run python main.py  # Starts on http://localhost:8001

Frontend

cd src/web_ui/frontend
npm install
npm run dev  # Starts on http://localhost:3000

3. Environment Configuration

# LLM Host settings
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b

# Database settings
export LLM_REDACT_DATABASE_URL=sqlite:///llm_redact.db

# Web UI settings (if using)
export LLM_REDACT_WEB_LLM_HOST_URL=http://localhost:8000
export LLM_REDACT_WEB_DATABASE_URL=sqlite:///llm_redact.db

Usage

CLI

# Basic redaction
llm-redact "John Smith's email is [email protected]"

# Custom LLM host
llm-redact "Sensitive text" --host http://localhost:8000

# Generate and use custom rules
llm-redact "Medical data here" --generate-rules "help me filter medical sensitive data"

Python API

import llm_redact

# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from [email protected]")

# Custom rules
from llm_redact import RedactionRule

custom_rules = [
    RedactionRule(
        name="Replace SSN with [SSN]",
        description="Social Security Numbers", 
        data_type="SSN"
    )
]

result = llm_redact.mask("SSN: 123-45-6789", rules=custom_rules)

# Advanced client usage
from llm_redact import LLMRedactClient

client = LLMRedactClient(
    llm_host_url="http://localhost:8000",
    database_url="postgresql://user:pass@localhost/redact_db"
)

# Generate rules with AI
rules = client.generate_rules("help me filter medical sensitive data")
result = client.mask(text, rules=rules)

# View history
history = client.get_history(limit=50)

Supported Data Types

  • Personal names → |_NAME_XXXX_|
  • Email addresses → |_EMAIL_XXXX_|
  • Phone numbers → |_PHONE_XXXX_|
  • SSN/Tax IDs → |_SSN_XXXX_|
  • Credit cards → |_CREDIT_CARD_XXXX_|
  • Addresses → |_ADDRESS_XXXX_|
  • Medical data → |_MEDICATION_XXXX_|
  • And more...

XXXX represents a unique 8-character hash for consistent masking

Components

This project consists of three main components:

1. PyPI Library (src/pypi/)

  • Core redaction client and database management
  • Published as llm-redact package
  • Supports custom rules and caching

2. LLM Host (src/llm_host/)

  • FastAPI server using Ollama for local LLM inference
  • Handles redaction requests and rule generation
  • Runs on port 8000 by default

3. Web UI (src/web_ui/)

  • Backend: FastAPI REST API (port 8001)
  • Frontend: Next.js dashboard (port 3000)
  • Rule management, history viewing, and analytics

Configuration

Environment Variables

Variable Default Description
LLM_REDACT_LLM_HOST_URL http://localhost:8000 LLM host service URL
LLM_REDACT_DATABASE_URL sqlite:///llm_redact.db Database connection string
LLM_REDACT_DEFAULT_MODEL gemma3:1b Default LLM model
LLM_REDACT_ENABLE_CACHING true Enable result caching

Database Support

  • SQLite (default) - Perfect for development and small deployments
  • PostgreSQL - Recommended for production
  • MySQL - Also supported via SQLAlchemy

Development

Setup Development Environment

# Clone repository
git clone https://github.com/lookr-fyi/llm-redact.git
cd llm-redact

# Install dependencies for all components
cd src/pypi && poetry install
cd ../llm_host && poetry install  
cd ../web_ui/backend && poetry install
cd ../frontend && npm install

Running Tests

cd src/pypi
poetry run pytest

Publishing

cd src/pypi
chmod +x publish.sh
./publish.sh

API Reference

Core Functions

  • llm_redact.mask(text, rules=None, model=None) - Simple redaction
  • LLMRedactClient.mask() - Advanced redaction with options
  • LLMRedactClient.generate_rules() - AI rule generation
  • LLMRedactClient.get_history() - View redaction history

REST API Endpoints

  • POST /redact - Redact text
  • POST /rule/generate - Generate rules
  • GET /health - Health check
  • GET /models - List available models

Full API documentation available at http://localhost:8000/docs and http://localhost:8001/docs

Examples

See src/pypi/examples/ for complete examples:

  • mask_demo.py - Basic redaction
  • rule_generation_demo.py - AI rule generation
  • rule_group_demo.py - Rule management
  • history_demo.py - History viewing

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and add tests
  4. Run tests: poetry run pytest
  5. Submit a pull request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add tests for new features
  • Update documentation
  • Use Poetry for dependency management

License

MIT License - see LICENSE file for details.

Repository

Support

  • Create an issue for bug reports or feature requests
  • Check existing issues before creating new ones
  • Contribute improvements via pull requests

Keywords: privacy, redaction, llm, pii, data-protection, sensitive-data, ai, local-llm, gdpr, hipaa

About

Privacy-first prompt redaction using local LLM models.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published