Skip to content

sjlangley/wikitree-intelligence

Repository files navigation

WikiTree Intelligence

WikiTree Intelligence is a local-first genealogy workbench for reconciling GEDCOM data with WikiTree.

The core job is not bulk import. The core job is:

  • finding likely existing WikiTree matches before creating duplicates
  • preserving durable match memory between runs
  • resuming large imports safely
  • surfacing missing matches and data discrepancies
  • preparing later sync-review items for already-matched profiles

Status

This repo is in active development.

Completed

Google Authentication And App Session Boundary (PR #10, merged 2026-04-08)

  • Frontend: React AuthProvider restores auth state on app load
  • Backend: FastAPI login/logout/current-user endpoints are live
  • Session cookies: Starlette SessionMiddleware persists app session state
  • User flow: Google sign-in, returning-session restore, and logout all work
  • Coverage: UI 20 tests, API 16 tests with 93.66% backend coverage

Database Spine with SQLModel Tables and State Machines (PR #12, merged 2026-04-09)

  • SQLModel table definitions for all 15 minimum v1 tables (single source of truth)
  • StrEnum-based state machines: ImportJobStatus (7 states), ImportJobStageStatus (5 states), MatchReviewStatus (5 states)
  • Explicit transition validation functions with terminal state detection
  • Async database engine initialization with auto table creation
  • Database-level enum constraints for state validation
  • Type-safe status fields for Pydantic validation at API layer
  • 52 passing state machine tests with 100% state_machines.py coverage
  • Coverage: 81.84% backend (meets 80% requirement)

WikiTree OAuth Integration (PR #30, merged 2026-04-09)

  • WikiTree API client with async httpx for login/logout/profile operations
  • Session manager for WikiTree connection state with 30-day expiry tracking
  • REST API routes: connect/initiate, callback, disconnect, status, profile retrieval
  • Browser-based OAuth-like flow with backend-owned session mapping
  • Security: open redirect prevention, backend-only WikiTree user_id storage
  • UI: WikiTree settings page with connect/disconnect flow
  • Coverage: 114 backend tests (92.24%), 44 frontend tests (100% passing)

Worker Package Scaffold (PR #33, pending merge 2026-04-13)

  • Separate apps/worker/ package with FastAPI structure
  • Health endpoints: /health/live (liveness) and /health/ready (readiness)
  • Worker ID auto-generation from hostname and PID
  • Docker integration with docker-compose.yml
  • CI workflows for worker tests, linting, and Docker builds
  • Comprehensive README with architecture, scaling, and troubleshooting guides
  • Coverage: 2 tests passing, 100% worker routes coverage

In Progress

Next: Import job infrastructure (PR6) - see pr6-import-job-plan.md for detailed implementation plan.

Planning and architecture documentation:

Note: PR5 (WikiTree dump ingestion) deferred until dump access is available. Building job infrastructure first.

Planned Stack

  • apps/api/ — Python backend
  • apps/worker/ — Background worker process for staged import/search jobs
  • apps/ingestion/ — WikiTree dump loading service (runs weekly)
  • apps/ui/ — React + TypeScript frontend
  • apps/api/tests/ — backend tests
  • apps/ui/tests/ — frontend tests
  • e2e/ — Playwright end-to-end tests
  • migrations/ — database migrations
  • shared Docker volume — raw uploaded GEDCOM storage in local development
  • PostgreSQL — WikiTree dump cache (refreshed weekly) + app data
  • docker-compose.yml — local orchestration

Version 1 Goals

  • Google-authenticated app session
  • WikiTree-authenticated private-data reads through the backend
  • WikiTree weekly dump cache for fast local search (millions of profiles)
  • hybrid search: local dump first, API supplement when needed
  • staged, resumable GEDCOM imports
  • background worker execution for large import/search jobs
  • one canonical person/relationship model
  • snapshot-backed review receipts and evidence packets
  • outward traversal through resolved matches
  • later sync-review queue for GEDCOM facts not yet in WikiTree

Engineering Rules

  • every PR must touch fewer than 10 files
  • every PR must be easy to review
  • repo coverage target is at least 80%
  • critical flows get Playwright coverage

Getting Started

Prerequisites

Backend Setup

cd apps/api

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e ".[dev]"

# Configure environment (copy .env.example to .env and fill in values)
cp .env.example .env

# Run development server
uvicorn api.app:app --reload

Backend runs at http://localhost:8000 API docs at http://localhost:8000/docs

Frontend Setup

cd apps/ui

# Install dependencies
npm install

# Configure environment (copy .env.example to .env and fill in values)
cp .env.example .env

# Run development server
npm run dev

Frontend runs at http://localhost:5173

Running Tests

Backend:

cd apps/api
source .venv/bin/activate
pytest -v

Frontend:

cd apps/ui
npm run test
npm run test:ci  # with coverage

Next Step

Continue with the next unfinished boundary from implementation-plan.md: WikiTree connection, import job storage/worker execution, and the canonical data model.

About

Use wikitree as a datastore for personal genealogy information

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors