Skip to content

hcss-utils/ruw-warehouse-overview

Repository files navigation

RUW Database Overview

Live dashboard: https://hcss-utils.github.io/ruw-warehouse-overview/

A lightweight status page for the Russian-Ukrainian War (RUW) Database. The page shows per-source document and chunk counts, annotation coverage, and freshness at a glance.

What it shows

Each row in the table corresponds to one source database (Telegram channels, Kremlin, ISW, Integrum, etc.). For every source the page displays:

Column Description
Lang Languages present in the source (up to 5)
Documents Total number of ingested documents
Chunks Total text chunks produced from those documents
Relevant chunks Chunks annotated as relevant, with coverage %
Last Updated Date of the most recent document in the source

Summary cards at the top aggregate totals across all sources and show the latest data date across the entire corpus.

How it works

SQL logic

assets/stats.sql runs a single query against public.uploaded_document joined through document_sectiondocument_section_chunk. Relevant chunks are determined by combining two annotation tables:

  • taxonomy — legacy classifications
  • taxonomy_annotation — current pipeline annotations (is_relevant = true and HLTP IS NOT NULL)

The two are merged with UNION (deduplicating chunk IDs) before counting. Language codes are aggregated with STRING_AGG(DISTINCT UPPER(...)) per source.

Technical pipeline

schedule (daily 06:00 UTC) or push to main
    │
    ▼
main.py
  ├── reads assets/stats.sql
  ├── executes against DATABASE (secret)
  ├── formats display values (number formatting, language ordering, dates)
  ├── writes data/stats.json  (raw snapshot for history)
  └── renders templates/index.j2 → index.html
    │
    ▼
GitHub Actions
  ├── commits data/stats.json  [skip ci]
  ├── copies index.html + assets/ → _site/
  └── deploys _site/ to GitHub Pages

Development

git clone https://github.com/hcss-utils/ruw-warehouse-overview.git
cd ruw-warehouse-overview

Work on a feature branch:

git checkout -b feature/<name>

After making changes, run quality checks before committing:

bash quality.sh

quality.sh runs ruff (linting + unused imports), isort, black, and ty (type checking) against Python 3.12.

Set DATABASE to a valid PostgreSQL connection string to run main.py locally:

DATABASE=postgresql://... uv run python main.py

The DATABASE secret used by GitHub Actions was set from a local .env file via the GitHub CLI:

gh secret set DATABASE --env-file .env

About

RUW DB Status

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors