Skip to content

phren0logy/inkognito

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Inkognito

Privacy-first document processing FastMCP server. Extract, anonymize, and segment documents through FastMCP's modern tool interface.

Please note: As an MCP, privacy of file contents cannot be absolutely guaranteed, but it is a central design consideration. While file contents should be low risk (but non-zero) risk for leakage, file names will, unavoidably and by design, be read and written by the MCP. Plan accordingly. Consider using a local model.

Quick Start

Installation

# Install via pip
pip install inkognito

# Or via uvx (no Python setup needed)
uvx inkognito

# Or run directly with FastMCP
fastmcp run inkognito

Configure Claude Desktop

If not already present, you need to make sure you add a filesystem MCP.

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "inkognito": {
      "command": "uvx",
      "args": ["inkognito"],
      "env": {
        // Optional: Add keys when extractors are implemented
        // "AZURE_DI_KEY": "your-key-here",
        // "LLAMAPARSE_API_KEY": "your-key-here"
      }
    },
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/you/input-files-or-whatever",
        "/Users/you/output-folder-if-you-want-one"
      ],
      "env": {},
      "transport": "stdio",
      "type": null,
      "cwd": null,
      "timeout": null,
      "description": null,
      "icon": null,
      "authentication": null
    }
  }
}

Basic Usage

In Claude Desktop:

"Extract this PDF to markdown"
"Anonymize all documents in my contracts folder"
"Split this large document into chunks for processing"
"Create individual prompts from this documentation"

Features

πŸ”’ Privacy-First Anonymization

  • Universal PII detection (50+ types)
  • Consistent replacements across all documents
  • Reversible with secure vault file
  • No configuration needed - smart defaults

πŸ“„ Multiple Extraction Options

  • Available Now: Docling (default, with OCR support)
  • Planned: Azure DI, LlamaIndex, MinerU (placeholders only)
  • Auto-selects best available option
  • Falls back to Docling if no cloud options

βœ‚οΈ Intelligent Segmentation

  • Large documents: 10k-30k token chunks
  • Prompt generation: Split by headings
  • Preserves context and structure
  • Markdown-native processing

FastMCP Tools

All tools are exposed through FastMCP's modern interface with automatic progress reporting and error handling.

anonymize_documents

Replace PII with consistent fake data across multiple files.

anonymize_documents(
    directory="/path/to/docs",
    output_dir="/secure/output"
)

extract_document

Convert PDF/DOCX to markdown.

extract_document(
    file_path="/path/to/document.pdf",
    extraction_method="auto"  # auto, docling (others coming soon)
)

segment_document

Split large documents for LLM processing.

segment_document(
    file_path="/path/to/large.md",
    output_dir="/output/segments",
    max_tokens=20000
)

split_into_prompts

Create individual prompts from structured content.

split_into_prompts(
    file_path="/path/to/guide.md",
    output_dir="/output/prompts",
    split_level="h2", #configurable, LLM should be able to read the contents of these files safely
)

restore_documents

Restore original PII using vault.

restore_documents(
    directory="/anonymized/docs",
    output_dir="/restored",
    vault_path="/secure/vault.json"
)

Extractor Status

Extractor Status Notes
Docling βœ… Fully Implemented Default extractor with OCR support (OCRMac on macOS, EasyOCR on other platforms)
Azure DI ⚠️ Placeholder Requires AZURE_DI_KEY environment variable when implemented
LlamaIndex ⚠️ Placeholder Requires LLAMAPARSE_API_KEY environment variable when implemented
MinerU ⚠️ Placeholder Will require magic-pdf library when implemented

Configuration

Following FastMCP conventions, all configuration is via environment variables:

# Optional API keys for cloud extractors (when implemented)
export AZURE_DI_KEY="your-key-here"
export LLAMAPARSE_API_KEY="your-key-here"

# Optional OCR languages (comma-separated, default: all available)
export INKOGNITO_OCR_LANGUAGES="en,fr,de"

Examples

Legal Document Processing

You: "Anonymize all contracts in the merger folder for review"

Claude: "I'll anonymize those contracts for you...

[Processing 23 files...]

βœ“ Anonymized 23 contracts
βœ“ Replaced: 145 company names, 89 person names, 67 case numbers
βœ“ Vault saved to: /output/vault.json

Research Paper Extraction

You: "Extract this 300-page research PDF"

Claude: "I'll extract that PDF to markdown...

[Using Docling for extraction...]

βœ“ Extracted 300 pages
βœ“ Preserved: tables, figures, citations
βœ“ Output size: 487,000 tokens
βœ“ Saved to: research_paper.md

Documentation to Prompts

You: "Split this API documentation into individual prompts"

Claude: "I'll split the documentation by endpoints...

[Splitting by H2 headings...]

βœ“ Created 47 prompt files
βœ“ Each prompt includes endpoint context
βœ“ Ready for training or testing

Performance

Extractor Speed Requirements Status
Azure DI 0.2-1 sec/page API key Planned
LlamaIndex 1-2 sec/page API key Planned
MinerU 3-7 sec/page Local, GPU Planned
Docling 5-10 sec/page Local, CPU βœ… Available

Privacy & Security

  • Local processing: No cloud services required
  • No persistence: Nothing saved without explicit paths
  • Secure vaults: Encrypted mapping storage
  • API key safety: Never logged or transmitted

Development

Running Locally

# Clone the repository
git clone https://github.com/phren0logy/inkognito
cd inkognito

# Run with FastMCP CLI
fastmcp dev

# Or run directly in development
uv run python server.py

Testing with FastMCP

# Install the server configuration
fastmcp install inkognito

# Test a specific tool
fastmcp test inkognito extract_document

Project Structure

inkognito/
β”œβ”€β”€ pyproject.toml          # FastMCP-compatible packaging
β”œβ”€β”€ LICENSE                 # MIT license
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ server.py               # FastMCP server and entry point
β”œβ”€β”€ anonymizer.py           # PII detection and anonymization
β”œβ”€β”€ vault.py                # Vault management for reversibility
β”œβ”€β”€ segmenter.py            # Document segmentation
β”œβ”€β”€ exceptions.py           # Custom exceptions
β”œβ”€β”€ extractors/             # PDF extraction backends
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base.py
β”‚   β”œβ”€β”€ registry.py
β”‚   β”œβ”€β”€ docling.py          # βœ… Implemented
β”‚   β”œβ”€β”€ azure_di.py         # Placeholder
β”‚   β”œβ”€β”€ llamaindex.py       # Placeholder
β”‚   └── mineru.py           # Placeholder
└── tests/

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages