Skip to content

Commit 687da77

Browse files
committed
Update Gatineau data: fix methodology, improve README, update PR description
1 parent 3db38d7 commit 687da77

File tree

4 files changed

+73
-203
lines changed

4 files changed

+73
-203
lines changed

data/gatineau/README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,13 @@ python3 data/gatineau/scripts/processor.py --year 2024 --extract
3232
python3 data/gatineau/scripts/processor.py --year 2024
3333
```
3434

35+
Defaults to `extracted/<year>/llm_extracted_en.md`. Use `--input-markdown` to specify a different file.
36+
3537
**Manual extraction:**
3638

3739
1. Open `raw/Gatineau Consolidated Financial Report <year>.pdf`
3840
2. Use prompt from `llm_prompt.txt` with your LLM
39-
3. Save output to `extracted/<year>/llm_extracted.md`
41+
3. Save output to `extracted/<year>/llm_extracted.md` (French) or `llm_extracted_en.md` (English)
4042
4. Run converter: `python3 data/gatineau/scripts/processor.py --year 2024`
4143

4244
## Output
@@ -53,3 +55,10 @@ Markdown must include:
5355
- `## Expenses – <year>` (nested bullets with amounts)
5456

5557
All amounts must be in full dollars (e.g., `$123,456,789`), not thousands. See `llm_prompt.txt` for full specification.
58+
59+
## Processing Features
60+
61+
- Collapses single-child categories into "parent – child" format
62+
- Handles negative values (parentheses format)
63+
- Preserves hierarchy for categories with multiple children
64+
- Backfills totals by summing children when missing

data/gatineau/llm_prompt.txt

Lines changed: 17 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,16 @@
11
## ✅ **Prompt Template**
22

3-
**Extract the consolidated accounting ACTUALS for the fiscal year {{YEAR}} from this financial report.
4-
Use only the detailed functional breakdown pages (“Analysis of consolidated revenues” and “Analysis of consolidated expenses”).
5-
Do NOT use fiscal reconciliation pages, affectations, amortization reversals, or charges-by-object pages.
6-
Ignore budgeted/projected amounts entirely.
7-
Use only the FINAL consolidated realized values for {{YEAR}}.**
3+
**Extract consolidated accounting ACTUALS for fiscal year {{YEAR}} from "Analyse des revenus consolidés" and "Analyse des charges consolidées" pages only.
4+
Ignore: fiscal reconciliation, affectations, amortization, charges-by-object, budgets, prior-year columns.**
85

96
### **Data selection rules (critical):**
107

11-
1. **Use the Accounting Consolidated view (operating + capital combined).**
12-
This must include ALL consolidated revenue and ALL consolidated spending for the year — no exclusions.
13-
14-
2. **Extract the full functional hierarchy (2–4 levels deep)** exactly as presented in:
15-
16-
* Analyse des revenus consolidés
17-
* Analyse des charges consolidées
18-
(These are usually the S-pages at the end of the report.)
19-
20-
**CRITICAL:** Preserve the EXACT hierarchical structure from the PDF. Do NOT roll up subcategories into parent categories.
21-
22-
* If the PDF shows "Transport" → "Réseau routier" → "Voirie municipale", extract all three levels
23-
* If the PDF shows "Hygiène du milieu" → "Eau et égout" → multiple sub-items, extract all sub-items with their parent
24-
* Do NOT combine "Transport régulier" and "Transport adapté" into a single "Transport en commun" line
25-
* Do NOT combine "Industries et commerces" and "Tourisme" into a single "Promotion et développement économique" line
26-
* Always maintain the parent-child relationships exactly as shown in the PDF
27-
28-
3. **Ignore the following sections:**
29-
30-
* Fiscal reconciliation (“conciliation à des fins fiscales”)
31-
* Affectations to/from reserves
32-
* Charges par objets (salaries, services, amortization)
33-
* Any budget columns
34-
* Any prior-year comparative columns
35-
* Any adjustments or footnotes that are *not actual consolidated values*
36-
37-
4. **Use the 'Total consolidated – ACTUAL {{YEAR}}' column only.**
38-
39-
**CRITICAL:** The PDF has multiple columns. You MUST use ONLY the "Données consolidées - Réalisations {{YEAR}}" column.
40-
41-
**DO NOT use:**
42-
* "Administration municipale Réalisations {{YEAR}}" column (this is NOT the consolidated value)
43-
* "Données consolidées Réalisations {{YEAR-1}}" column (this is the prior year, NOT {{YEAR}})
44-
* Any "Sans ventilation" or "Ventilation" columns
45-
46-
**ALWAYS use:** "Données consolidées - Réalisations {{YEAR}}" column (the rightmost 2024 column in consolidated data tables)
8+
1. **Column:** Use ONLY "Données consolidées - Réalisations {{YEAR}}" column.
9+
❌ DO NOT use "Administration municipale" or "Réalisations {{YEAR-1}}" columns.
10+
11+
2. **Scope:** Extract ALL consolidated revenue and spending (operating + capital combined).
12+
13+
3. **Hierarchy:** Preserve EXACT structure — extract all levels, do NOT roll up or combine siblings.
4714

4815
---
4916

@@ -58,93 +25,24 @@ Use only the FINAL consolidated realized values for {{YEAR}}.**
5825
* Debt interest – **$<amount>**
5926
* Property tax revenue – **$<amount>**
6027

61-
(Money must be in **actual dollars** (full precision), with `$` prefix. Do NOT use "k$" or thousands notation.)
62-
63-
---
64-
65-
### **## Revenues – {{YEAR}}**
66-
67-
Follow this exact structure:
68-
69-
* **Category (no amount)**
70-
71-
* Subcategory – **$123,456,789**
72-
* Subcategory with children:
73-
74-
* Sub-subcategory – **$50,000,000**
75-
* Sub-subcategory – **$30,000,000**
76-
* **Total Subcategory – $80,000,000**
77-
* Another subcategory – **$40,000,000**
78-
* **Total Category – $120,000,000**
79-
80-
**Indentation rules:**
81-
82-
* Top level: `* **Category**`
83-
* 2 spaces indent for sub-items
84-
* 4 spaces indent if deeper levels exist
85-
* Always include **Total** lines for any category with children
86-
* Use dashes between label and number: `Name – **$amount**`
87-
* Use full precision: actual dollar amounts with commas (e.g., **$123,456,789**), NOT thousands notation (no "k$")
88-
89-
**CRITICAL formatting rules:**
28+
### **## Revenues – {{YEAR}}** / **## Expenses – {{YEAR}}**
9029

91-
* **Negative values:** If a value is negative in the PDF, use parentheses: `**($123,456)**` NOT `**-$123,456**`
92-
* **Section order:** Maintain the EXACT order of sections as they appear in the PDF pages (do not reorder)
93-
* **Subcategory preservation:** If the PDF shows a parent category with multiple subcategories, extract ALL subcategories, not just the parent total
94-
* **"Autres" items:** If "Autres" appears under a parent category (e.g., "Sécurité incendie" → "Autres"), preserve that hierarchy - do NOT flatten it
95-
96-
---
97-
98-
### **## Expenses – {{YEAR}}**
99-
100-
Use **the same structure as revenues**, fully respecting the functional hierarchy:
101-
102-
* Administration
103-
* Police
104-
* Fire
105-
* Roads
106-
* Water
107-
* Waste
108-
* Recreation
109-
* Culture
110-
* Planning
111-
* Electricity
112-
* Etc.
113-
114-
**Go as deep as the PDF allows (2–4 levels).**
30+
**Format:**
31+
* Structure: `* **Category**` → ` * Subcategory – **$123,456,789**` → ` * Sub-subcategory – **$50,000,000**` → ` * **Total Subcategory – $80,000,000**`
32+
* Amounts: Full dollars `**$123,456,789**` (NOT "k$"); negatives: `**($123)**` NOT `**-$123**`
33+
* Totals: Include **Total** line for categories with children
34+
* Order & Hierarchy: Maintain EXACT PDF page order; extract all levels, do NOT roll up
11535

11636
---
11737

11838
## ⚠️ Precision Rules
11939

120-
* Do not invent categories; only use what is in the "Analyse des revenus/charges" tables.
121-
* Totals must match the PDF exactly.
122-
* Sub-items must never exceed parent totals (flag if PDF has rounding differences).
123-
* Exclude amortization, fiscal adjustments, reconciliation entries, or affectations — these only apply to legal fiscal statements, not accounting consolidated totals.
124-
* Extract actuals only, no budgets.
125-
126-
**Common mistakes to avoid:**
127-
128-
* ❌ Using "Réalisations 2023" values instead of "Réalisations {{YEAR}}" values
129-
* ❌ Using "Administration municipale" column instead of "Données consolidées" column
130-
* ❌ Rolling up subcategories (e.g., combining "Transport régulier" + "Transport adapté" into one line)
131-
* ❌ Missing negative signs for negative values
132-
* ❌ Reordering sections (maintain PDF page order)
133-
* ❌ Flattening hierarchy (e.g., "Sécurité incendie" → "Autres" should be nested, not flat)
134-
135-
**Verification checklist:**
136-
137-
* ✅ Every value comes from "Données consolidées - Réalisations {{YEAR}}" column
138-
* ✅ No values from "Réalisations {{YEAR-1}}" (prior year) column
139-
* ✅ Hierarchy matches PDF structure exactly (2-4 levels deep)
140-
* ✅ Negative values are shown in parentheses
141-
* ✅ Section order matches PDF page order
142-
* ✅ All subcategories are preserved, not rolled up
40+
* Use only categories from "Analyse des revenus/charges" tables — do not invent categories
41+
* Totals must match PDF exactly; flag rounding differences if sub-items exceed parent totals
42+
* Extract actuals only (no budgets, amortization, fiscal adjustments, reconciliation entries)
14343

14444
---
14545

14646
## 🧩 **Surplus/Deficit Calculation**
14747

148-
At the end, compute:
149-
15048
**Total Surplus = Total Revenues – Total Expenses**

0 commit comments

Comments
 (0)