Simplified source scraping with campaigns

## Overview

Simplified approach for adding government sources (parliaments, ministries, etc.) as data sources. Instead of scraper configs, we send index pages directly to the model (like we already do with detail pages). No pagination support.

## Architecture

### Refactor WikipediaLink → Source

Generalize the current `WikipediaLink` model to handle any source type:

```python
class SourceType(enum.Enum):
    INDEX = "INDEX"      # List page with links to politicians
    DETAIL = "DETAIL"    # Individual politician page

class Source(Base):
    id: UUID
    url: str
    source_type: SourceType
    
    # Optional Wikipedia-specific fields
    wikipedia_project_id: str | None  # FK to WikipediaProject (NULL for non-Wikipedia)
    politician_id: UUID | None  # FK to Politician (NULL for INDEX sources)
    
    # Campaign association
    campaign_id: UUID | None  # FK to Campaign (NULL for Wikipedia links)
    
    # Relationships
    languages: list[Language]  # via SourceLanguage link table

class SourceLanguage(Base):
    """Link table between sources and language entities (many-to-many)."""
    source_id: UUID  # FK to Source, PK
    language_id: str  # FK to Language entity, PK
```

Sources can have multiple languages (Wikipedia projects can have multiple `LANGUAGE_OF_WORK` relations in Wikidata). The `SourceLanguage` link table mirrors the existing `ArchivedPageLanguage` pattern.

### Campaign Model

Campaigns group index sources with metadata for batch processing:

```python
class Campaign(Base):
    id: UUID
    name: str
    country_id: str | None  # FK to Country entity (optional filter)
    position_ids: list[str]  # Array of Position QIDs (optional filter)
    created_at: datetime
    
    # Relationships
    sources: list[Source]  # INDEX sources belonging to this campaign
```

### Language on Source vs ArchivedPage

Move language association from `ArchivedPage` to `Source`:
- Language is known at source creation time (from Wikipedia project relations or user input)
- `ArchivedPage` becomes purely about content storage
- Simplifies the data model - source metadata stays with source
- Preserves support for multiple languages per source

## Workflow

### Index Page Processing

1. Create Campaign with index URLs as Source records (`source_type=INDEX`)
2. Run campaign:
   - Fetch each index URL via Playwright → ArchivedPage
   - Send rendered HTML to LLM with prompt: "Extract all politician detail page URLs from this index"
   - Create new Source records (`source_type=DETAIL`) for extracted URLs
   - Link detail sources to campaign's country/positions for filtering

### Detail Page Processing  

Detail pages go through existing enrichment pipeline:
- Triggered via `enrich-wikipedia` CLI or API
- Same LLM extraction for politician properties
- Same evaluation workflow

Supersedes #109

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplified source scraping with campaigns #112

Overview

Architecture

Refactor WikipediaLink → Source

Campaign Model

Language on Source vs ArchivedPage

Workflow

Index Page Processing

Detail Page Processing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simplified source scraping with campaigns #112

Description

Overview

Architecture

Refactor WikipediaLink → Source

Campaign Model

Language on Source vs ArchivedPage

Workflow

Index Page Processing

Detail Page Processing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions