Skip to content

feat: add 5 Luxembourg source snapshots for priority-1 migration targets#89

Merged
TommiLindfors merged 1 commit into
clarvia-org:mainfrom
HirenGajjar:feat/snapshots-lu-priority1
Jun 7, 2026
Merged

feat: add 5 Luxembourg source snapshots for priority-1 migration targets#89
TommiLindfors merged 1 commit into
clarvia-org:mainfrom
HirenGajjar:feat/snapshots-lu-priority1

Conversation

@HirenGajjar
Copy link
Copy Markdown
Collaborator

@HirenGajjar HirenGajjar commented Jun 7, 2026

Part of issue #46 — volunteer source snapshot capture.

Adds HTML snapshots and metadata YAML files for 5 Luxembourg government sources identified as Priority 1 migration candidates in docs/migration-candidates.md (PR #38).


Context and approach

The migration tracking document (PR #38, closes #14) identified these 5 sources as Priority 1 — high confidence, source-backed, self-contained. Capturing their HTML snapshots is the prerequisite step before source records and assertion batches can be authored for each.

All 5 sources were previously researched and URL-verified during workflow-data contributions (#44, #47–50). This PR provides the archived page content that assertion authors will extract claims from.


What was done

Step 1 — Downloaded HTML pages

Used curl -L to download the French (/fr/) version of each page directly into the correct directory structure under sources/snapshots/html/lu/.

Step 2 — Verified HTML integrity

Checked each file for valid <!DOCTYPE HTML>, lang="fr", and UTF-8 encoding. All 5 files confirmed as real page content, not redirects or error pages.

Step 3 — Extracted exact page titles

Ran grep -o '<title>[^<]*</title>' against each HTML file to extract the exact
browser tab title. Updated YAML files to match exactly — not assumed titles.

Step 4 — Created metadata YAML files

Created one .yml file per HTML file following the template in issue #46:

  • url — exact URL captured
  • captured_at — ISO 8601 datetime
  • capture_method: manual_download
  • captured_by: contributor.HirenGajjar
  • language: fr
  • page_title — verified against actual <title> tag

Files captured

File URL Page title (verified)
cnap_lu/survivor-pension/pension-survie_fr.html https://cnap.public.lu/fr/pensions/pension-survie.html Pension de survie - CNAP.lu d'Pensiounskeess - Caisse nationale d'assurance Pension - Luxembourg
cns_lu/death-funeral-costs/deces-frais-funeraires_fr.html https://cns.public.lu/fr/assure/droits-demarches/dossiers-thematiques/famille/deces-frais-funeraires.html Décès et frais funéraires - CNS - Luxembourg
guichet_lu/funeral-allowance/indemnite-funeraire_fr.html https://guichet.public.lu/fr/citoyens/aides/sante/prestations-survivants/indemnite-funeraire.html Indemnité funéraire - Guichet.lu - Luxembourg
guichet_lu/succession-declaration/declaration-succession_fr.html https://guichet.public.lu/fr/citoyens/sante/fin-vie/deces/declaration-succession.html Déclaration de succession ou de mutation par décès - Guichet.lu - Luxembourg
guichet_lu/bereavement-leave/conge-extraordinaire_fr.html https://guichet.public.lu/fr/citoyens/sante/fin-vie/deces/conge-extraordinaire.html Congé extraordinaire pour motif personnel - Guichet.lu - Luxembourg

Why these 5 sources

These map directly to missing graph records identified in the migration tracking document:

  • CNAP survivor pension — feeds source.cnap_lu.survivor_pension + assertion batch. CNAP source currently partial in graph (survivor pension consequence exists via guichet source but no standalone CNAP record).
  • CNS death and funeral costs — feeds source.cns_lu.death_notification. CNS funeral allowance chain has no equivalent in graph yet.
  • Guichet funeral allowance — new domain. Funeral allowance (indemnité funéraire) not yet modelled in the consequence graph.
  • Guichet succession declaration — feeds succession domain. 6-month filing window (8 months for death abroad) verified in workflow-data feat: implement 4 missing CLI validation commands + promote records to approved #41.
  • Guichet bereavement leave — new domain. Extraordinary leave for bereavement not yet modelled in graph.

Source traceability

Source Previously researched in
CNAP survivor pension workflow-data PR #44 (issue #18)
CNS death/funeral costs workflow-data PR #45 (issue #17)
Guichet funeral allowance workflow-data PR #47 (issue #40 PR 1/4)
Guichet succession declaration workflow-data PR #41 (issue #28 verification)
Guichet bereavement leave workflow-data source corpus

Verified

  • All 5 HTML files confirmed valid — <!DOCTYPE HTML>, lang="fr", UTF-8
  • All 5 page titles verified against actual <title> tags in HTML
  • All 5 YAML trailing newlines confirmed 0a
  • 10 files total — 5 HTML + 5 YAML, all paired correctly
  • Naming convention followed: <slug>_<language>.html / <slug>_<language>.yml
  • DCO sign-off included (git commit -s)
  • No unintended changes

Signed-off-by: HirenGajjar <gajjarhiren111@gmail.com>
@HirenGajjar HirenGajjar requested a review from TommiLindfors June 7, 2026 06:55
Copy link
Copy Markdown
Contributor

@TommiLindfors TommiLindfors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, Hiren! 🎉

This is exactly the kind of thorough, methodical contribution that makes a real difference. I really appreciate:

  • Verified HTML integrity — checking DOCTYPE, lang, and encoding is exactly the right approach
  • Title extraction from actual HTML — not assumed, cross-checked with grep
  • Clean YAML metadata — all 5 files consistent, correct fields, proper naming convention
  • Source traceability — linking back to the original workflow-data PRs where these sources were researched

All 5 sources are high-value Priority 1 targets and this unblocks the assertion authoring work. CI passes cleanly. Merging now!

@TommiLindfors TommiLindfors merged commit 82e4a73 into clarvia-org:main Jun 7, 2026
2 checks passed
@TommiLindfors
Copy link
Copy Markdown
Contributor

Now that these 5 snapshots are in, the natural next step is issue #48 — creating assertion batches from captured snapshots. If you're interested in continuing this chain, that would be an amazing follow-up! The snapshots you just captured are exactly the inputs that issue needs. 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Identify workflow-data checklist items suitable for migration

2 participants