Civic Table

Turn degraded archival documents into structured evidence.

Civic Table is a forensic document intelligence platform that extracts entities, relationships, and timelines from historical records — then presents them as an interactive evidence graph with full provenance tracking.

Built for families pursuing property restitution claims, researchers working with degraded archives, and anyone who needs to reconstruct a documentary record where the paper trail is fragmented, multilingual, and decades old.

The Problem

A family has a box of 60-year-old scanned documents — deeds, wills, confiscation orders — in faded Spanish with OCR errors. They need to prove who owned what, when, and how ownership transferred. No existing tool does this without losing the chain of evidence.

What Civic Table Does

Ingests scanned PDFs through OCR with LLM-powered text cleanup
Extracts people, properties, organizations, locations, and their relationships using Claude
Builds a knowledge graph with ML-powered entity deduplication and analyst-controlled merge workflows
Generates AI narratives, per-entity descriptions, document analyses, and a forensic PDF dossier
Presents everything in a read-only web interface with verification tiers on every data point

Every extracted fact carries a verification tier:

Tier	Meaning
`TIER_3_AI`	Machine-extracted, unverified
`TIER_2_ANALYST`	Human-reviewed and confirmed
`TIER_1_CERTIFIED`	Legally certified by external body

Nothing is presented as truth until a human says it is.

Architecture

Two-zone, air-gapped design. Processing and presentation are fully separated.

farmer_factory/  (Python)          farmer_vault/  (Next.js)
─────────────────────────          ──────────────────────────
PDF Intake                         Case Dashboard
  -> Image Preprocessing           Entity Browser
  -> OCR (Google Cloud Vision)     Document Viewer (OCR + cleaned text)
  -> LLM Text Cleanup (Haiku)     Knowledge Graph (force-directed)
  -> Entity Extraction (Haiku)     Scroll-Driven Timeline
  -> Relation Extraction           AI Narrative Explorer
  -> Graph Build + Dedup           Forensic Dossier Download
  -> JSON Export ──────────────>   Read-Only API Layer

Data flows one direction: Factory produces JSON artifacts, Vault consumes them. Clients never touch the processing zone.

Tech Stack

Factory (Python): Pydantic, NetworkX, Google Cloud Vision, Anthropic Claude API, dedupe (ML entity resolution)

Vault (Next.js): TypeScript, Tailwind CSS, shadcn/ui, react-force-graph-2d

Infrastructure: Domain-configurable (swap entity types and extraction prompts via YAML), case-level focus configuration, LaTeX dossier generation via XeLaTeX

Quick Start

# Clone and set up Python environment
git clone https://github.com/juanceresa/forensic_analysis_platform.git
cd forensic_analysis_platform
python3 -m venv venv
source venv/bin/activate
pip install -r farmer_factory/requirements.txt

# Install synthetic demo case
bash scripts/install_demo_case.sh

# Start the Vault UI
cd farmer_vault
npm install
npm run dev

Open http://localhost:3000 — the app auto-selects the demo case.

Processing Pipeline (CLI)

# Create a case and add PDFs to cases/MY-CASE/intake/
python3 -m farmer_factory.cli create-case --id MY-CASE --name "My Case" --family "Family"

# Run the full extraction pipeline (OCR -> cleanup -> extraction -> graph)
python3 -m farmer_factory.cli process MY-CASE

# Post-processing: resolve orphan relations, review merges, apply
python3 -m farmer_factory.cli resolve-orphans MY-CASE
python3 -m farmer_factory.cli apply-merges MY-CASE

# Generate AI analysis (runs on the cleaned graph, not during processing)
python3 -m farmer_factory.cli analyze MY-CASE

# Generate forensic PDF dossier
python3 -m farmer_factory.cli generate-dossier MY-CASE \
  --property-id <PROPERTY_ENTITY_ID> \
  --family-member-id <PERSON_ENTITY_ID>

Repository Layout

farmer_factory/                Python processing pipeline + CLI
  domains/configs/             Domain configurations (entity types, relation types, prompts)
  extract/                     LLM entity/relation extraction
  narrative/                   AI narrative, descriptions, document analysis generation
  structure/                   Graph construction, dedup, merge engine
  dossier/                     LaTeX PDF dossier generator
farmer_vault/                  Next.js read-only viewer + API layer
tests/                         Python test suite
docs/                          Architecture, security, frontend, and operational docs
examples/demo_case/            Synthetic demo data

Key Design Decisions

Verification tiers are not optional. Every node and edge in the graph carries provenance metadata. The UI enforces this visually — unverified data is always labeled.
Entity merges require human review. The system suggests merges via ML deduplication, but an analyst must confirm them in YAML authority files before they take effect.
AI analysis is separated from extraction. The process command builds a raw graph. AI narratives and descriptions are generated later on the cleaned graph, so you don't burn credits on output you'll throw away during merge review.
Domain-configurable. Entity types, relation types, and extraction prompts are defined in YAML, not hardcoded. The current domain is Cuban property restitution; the architecture supports genealogy, academic research, investigative journalism, or any documentary recovery domain.

Data Policy

This repository does not contain real client or family case data.

Case data lives under cases/ (git-ignored)
Secrets stay in .env / .env.local (git-ignored)
Demo data under examples/demo_case/ is synthetic

Documentation

License

AGPL-3.0-only — see LICENSE

Author

Juan Ceresa — @juanceresa

Built at the intersection of civic infrastructure, AI, and documentary recovery. Methodology developed at the Farmer House Democratic Repair Lab at Huston-Tillotson University.

Name		Name	Last commit message	Last commit date
Latest commit History 375 Commits
docs		docs
examples/demo_case		examples/demo_case
farmer_factory		farmer_factory
farmer_vault		farmer_vault
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Civic Table

The Problem

What Civic Table Does

Architecture

Tech Stack

Quick Start

Processing Pipeline (CLI)

Repository Layout

Key Design Decisions

Data Policy

Documentation

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Civic Table

The Problem

What Civic Table Does

Architecture

Tech Stack

Quick Start

Processing Pipeline (CLI)

Repository Layout

Key Design Decisions

Data Policy

Documentation

License

Author

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages