Scorecard System Workflow

This document describes the complete scorecard system for the DigitalChild project, which tracks 10 human rights indicators across countries and enriches document metadata.

Overview

The scorecard system provides country-level data on digital child protection policies and LGBTQ+ rights. It consists of:

Data Source: scorecard_main.xlsx - Excel file with country indicators
Loader: processors/scorecard.py - Loads and caches scorecard data
Enricher: processors/scorecard_enricher.py - Adds scorecard data to document metadata
Exporter: processors/scorecard_export.py - Creates CSV exports for website/analysis
Validator: processors/scorecard_validator.py - Checks source URLs for broken links
Diff Checker: processors/scorecard_diff.py - Monitors sources for changes

Indicators

The scorecard tracks 10 indicators (each with value + source URL):

AI_Policy_Status - National AI policy/strategy status
Data_Protection_Law - Data protection/privacy legislation
Children_Data_Safeguards - Child-specific data protection measures
SOGI_Sensitive_Data - Sexual orientation/gender identity data protections
DPA_Independence - Data Protection Authority independence
DPIA_Required_High_Risk_AI - Data protection impact assessments for AI
LGBTQ_Legal_Status - Legal status of LGBTQ+ people
Promotion_Propaganda_Offences - Anti-LGBTQ+ propaganda laws
COP_Strategy - Child online protection strategy
SIM_Biometric_ID_Linkage - SIM registration and biometric requirements

Data File Structure

File: data/scorecard/scorecard_main.xlsx

The scorecard Excel file contains multiple sheets:

UN_194 (primary sheet): 194 UN member states with all 10 indicators
SADC: 16 SADC member states (regional subset)
ECOWAS: 13 ECOWAS member states (regional subset)
Global: Scoring rules and methodology documentation

IMPORTANT: The scorecard.py loader reads from the UN_194 sheet by default. This sheet contains the complete dataset for all 194 countries.

Sheet Structure (UN_194):

Column 1: RowNumber
Column 2: Country (full country name)
Columns 3-4: Region - Broad, Region - Specific
Columns 5+: Indicator value columns paired with _Source columns

Example: AI_Policy_Status (value) + AI_Policy_Status_Source (URL)

Architecture

scorecard_main.xlsx
  (UN_194 sheet)
        ↓
    scorecard.py (loader)
        ↓
    ┌───────────┴───────────┐
    ↓                       ↓
scorecard_enricher.py  scorecard_export.py
(add to metadata)      (CSV exports)
    ↓
metadata.json
(enriched documents)

Workflows

1. Initial Scorecard Setup

# 1. Place scorecard_main.xlsx in project root
# 2. Test scorecard loads correctly
python -c "from processors.scorecard import load_scorecard; print(load_scorecard())"

# 3. Run tests to verify
pytest tests/test_scorecard.py -v

2. Enrich Document Metadata

Add scorecard indicators to documents based on their country:

# Enrich all documents in metadata.json
python processors/scorecard_enricher.py

# Dry run (don't save changes)
python processors/scorecard_enricher.py --dry-run

# Show enrichment summary only
python processors/scorecard_enricher.py --summary

Programmatic usage:

from processors.scorecard_enricher import enrich_document, enrich_all_metadata

# Enrich single document
doc = {"id": "doc-1", "country": "Albania"}
enriched_doc = enrich_document(doc)

# Enrich all metadata
stats = enrich_all_metadata(save=True)
print(f"Enriched {stats['enriched']} documents")

Output format:

{
  "id": "doc-1",
  "country": "Albania",
  "scorecard": {
    "matched_country": "Albania",
    "enriched_at": "2024-01-15T10:30:00Z",
    "indicators": {
      "AI_Policy_Status": {
        "value": "Draft policy under development (2023)",
        "source": "https://..."
      },
      "Data_Protection_Law": {
        "value": "Law No. 9887 (2008), aligned with GDPR",
        "source": "https://..."
      }
      // ... 8 more indicators
    }
  }
}

3. Export Scorecard Data

Generate CSV exports for website/analysis:

# From Python code
from processors.scorecard_export import export_scorecard

exports = export_scorecard()
# Returns:
# {
#   "summary": "data/exports/scorecard_summary.csv",
#   "sources": "data/exports/scorecard_sources.csv",
#   "indicator_counts": "data/exports/scorecard_indicator_counts.csv"
# }

Export types:

Summary CSV: All countries with all indicators (for main table)
Sources CSV: All source URLs (for verification/citation)
Indicator Counts: Distribution of values per indicator (for charts)
By Indicator: Individual CSV per indicator
By Region: Countries filtered by region

Programmatic usage:

from processors.scorecard_export import ScorecardExporter

exporter = ScorecardExporter()

# Export specific region
exporter.export_by_region("Africa", "data/exports/africa.csv")

# Export specific indicator
exporter.export_by_indicator("LGBTQ_Legal_Status", "data/exports/lgbtq_status.csv")

# Export all at once
exports = exporter.export_all()

4. Validate Source URLs

Check all source URLs for broken/redirected links:

# Run validation
python processors/scorecard_validator.py

# Custom worker count
python processors/scorecard_validator.py --workers 20

# Don't save reports
python processors/scorecard_validator.py --no-save

Output:

data/exports/scorecard_url_validation.json - Full validation report
data/exports/scorecard_broken_links.csv - Broken links only (for review)

Programmatic usage:

from processors.scorecard_validator import run_validation

report = run_validation(save_reports=True)
print(f"{report['ok']} OK, {report['broken']} broken, {report['redirected']} redirected")

5. Monitor for Changes

Check monitored sources for content changes:

# Check all monitored sources
python processors/scorecard_diff.py

# Check specific country sources
python processors/scorecard_diff.py --country "South Africa"

# Check sources only (skip stale entry detection)
python processors/scorecard_diff.py --sources-only

Monitored sources:

UNESCO AI Policy Observatory
UNCTAD Data Protection Tracker
ILGA World Maps
Human Dignity Trust
GSMA SIM Registration

Programmatic usage:

from processors.scorecard_diff import run_diff_check, check_country_sources

# Full check
report = run_diff_check(save_report=True)

# Check specific country
results = check_country_sources("Kenya")

Integration with Pipeline

The scorecard enrichment is not part of the main pipeline (pipeline_runner.py) by default. It's a separate step run after documents are processed.

Typical workflow:

# 1. Run pipeline to scrape and process documents
python pipeline_runner.py --source upr --country "Kenya"

# 2. Enrich metadata with scorecard
python processors/scorecard_enricher.py

# 3. Export scorecard data for website
python -c "from processors.scorecard_export import export_scorecard; export_scorecard()"

Adding to Pipeline (Optional)

To integrate scorecard enrichment into the pipeline:

# In pipeline_runner.py, after process_documents():

from processors.scorecard_enricher import enrich_all_metadata

# After processing is complete
if args.enrich_scorecard:
    logger.info("Enriching metadata with scorecard indicators...")
    stats = enrich_all_metadata(save=True)
    logger.info(f"Enriched {stats['enriched']} documents")

Maintenance Tasks

Update Scorecard Data

Edit scorecard_main.xlsx with new data
Force reload: load_scorecard(force_reload=True)
Re-enrich metadata: python processors/scorecard_enricher.py
Re-export: python -c "from processors.scorecard_export import export_scorecard; export_scorecard()"

Verify Data Quality

# Check for broken links
python processors/scorecard_validator.py

# Check for stale entries
python processors/scorecard_diff.py

# Run all scorecard tests
pytest tests/test_scorecard.py -v

Add New Indicator

Add column pair to scorecard_main.xlsx:
- New_Indicator (value column)
- New_Indicator_Source (source URL column)

Update INDICATOR_COLUMNS in processors/scorecard.py:

INDICATOR_COLUMNS = [
    # ... existing indicators
    ("New_Indicator", "New_Indicator_Source"),
]

Re-run enrichment and exports

File Locations

Source Data: scorecard_main.xlsx (project root)
Exports: data/exports/scorecard_*.csv
Validation Reports: data/exports/scorecard_url_validation.json
Diff Reports: data/exports/scorecard_diff_report.json
Cache: data/cache/scorecard_sources/*.json

Testing

# Run all scorecard tests
pytest tests/test_scorecard.py -v

# Run specific test class
pytest tests/test_scorecard.py::TestScorecardLoader -v

# Run with coverage
pytest tests/test_scorecard.py --cov=processors/scorecard --cov-report=html

Troubleshooting

Wrong Sheet Name Error

Problem: ValueError: Worksheet named 'X' not found

Solution: The scorecard file has multiple sheets. The loader expects the UN_194 sheet by default (as of 2026-01-24). If you see this error:

Check that scorecard_main.xlsx contains a sheet named "UN_194"
Verify the sheet has 194 rows (countries) with all indicator columns
The sheet name is hard-coded in processors/scorecard.py line 65:
```
df = pd.read_excel(filepath, sheet_name="UN_194")
```

Historical Note: Prior to 2026-01-24, the code expected a sheet named "Sheet1". This was updated to use the properly named "UN_194" sheet for clarity.

Country Not Found

Problem: Document country doesn't match scorecard country names

Solution: The loader tries multiple normalization methods:

Exact match (case-insensitive)
ISO code lookup
Fuzzy matching

Check country names in metadata vs scorecard:

from processors.scorecard import get_countries_list

countries = get_countries_list()
print(countries)  # List all scorecard countries

Missing Indicators

Problem: Enriched document missing some indicators

Solution: Check for empty cells in scorecard_main.xlsx. Empty values are skipped.

Validation Timeouts

Problem: URL validation takes too long

Solution: Reduce worker count or increase timeout:

from processors.scorecard_validator import validate_all_urls

report = validate_all_urls(max_workers=5)  # Slower but more reliable

Future Enhancements

Auto-update from sources: Automatically scrape monitored sources and update scorecard
Version tracking: Track scorecard changes over time
API endpoint: Serve scorecard data via REST API for website
Visualization: Generate charts/maps from scorecard data
Comparison mode: Compare countries side-by-side
Timeline view: Show indicator changes over time per country

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scorecard System Workflow

Overview

Indicators

Data File Structure

Architecture

Workflows

1. Initial Scorecard Setup

2. Enrich Document Metadata

3. Export Scorecard Data

4. Validate Source URLs

5. Monitor for Changes

Integration with Pipeline

Adding to Pipeline (Optional)

Maintenance Tasks

Update Scorecard Data

Verify Data Quality

Add New Indicator

File Locations

Testing

Troubleshooting

Wrong Sheet Name Error

Country Not Found

Missing Indicators

Validation Timeouts

Future Enhancements

Related Documentation

FilesExpand file tree

SCORECARD_WORKFLOW.md

Latest commit

History

SCORECARD_WORKFLOW.md

File metadata and controls

Scorecard System Workflow

Overview

Indicators

Data File Structure

Architecture

Workflows

1. Initial Scorecard Setup

2. Enrich Document Metadata

3. Export Scorecard Data

4. Validate Source URLs

5. Monitor for Changes

Integration with Pipeline

Adding to Pipeline (Optional)

Maintenance Tasks

Update Scorecard Data

Verify Data Quality

Add New Indicator

File Locations

Testing

Troubleshooting

Wrong Sheet Name Error

Country Not Found

Missing Indicators

Validation Timeouts

Future Enhancements

Related Documentation