Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
179a962
Update requirements
simonwoerpel Nov 2, 2023
86c5052
Ignore scraper data directories
simonwoerpel Nov 2, 2023
fd28600
Update scraper user agent
simonwoerpel Nov 2, 2023
6f3d0ca
BE: 2023 tweaks
simonwoerpel Nov 2, 2023
8e32816
ee spider: fix year parameter to take effect
Tilana Nov 3, 2023
104c68d
es_scraper: add 2022 url and apply minor fixes
Tilana Nov 3, 2023
ddf652b
fr_scraper: adjust for 2022 data
Tilana Nov 3, 2023
2eb0b7b
gb_scraper: adjust to 2022
Tilana Nov 3, 2023
8eaccee
LU: scrapy working 2023
simonwoerpel Nov 5, 2023
497d221
AT: scrapy working 2023
simonwoerpel Nov 5, 2023
a1b0ebc
DE: Add scrapy scraper
simonwoerpel Nov 5, 2023
a90d424
LV: add scrapy scraper
simonwoerpel Nov 5, 2023
e04b65b
SK: Fix scrapy scraper for 2022
simonwoerpel Nov 5, 2023
ec23cd1
HU: Update converter for 2023
simonwoerpel Nov 5, 2023
e9634bf
EU: Cool URIs always change
simonwoerpel Nov 5, 2023
045ecad
here as well -.-
simonwoerpel Nov 5, 2023
580e1fa
Merge pull request #1 from investigativedata/dev
simonwoerpel Nov 7, 2023
6840e9b
BE: slugify
simonwoerpel Nov 7, 2023
d15ec4c
MT: add a bad notebook
simonwoerpel Nov 9, 2023
e1088e7
LT: add some code
simonwoerpel Nov 9, 2023
2ba064a
CY: adjust data source to 2022 / flag old scraper
Tilana Nov 9, 2023
e7017d3
Improve folder structure
jfilter Nov 22, 2024
fb13a5e
Improve README
jfilter Nov 22, 2024
27419ca
Update requirements to get it to run again
jfilter Nov 22, 2024
2639dc1
Add basic cli structure
jfilter Dec 8, 2024
6a4d5b8
Add file download
jfilter Dec 9, 2024
e5c892c
Add file processing
jfilter Dec 10, 2024
12117e2
Add description for hard-to-automate exports
jfilter Dec 12, 2024
b7ac44f
Add new RO scraper
jfilter Dec 12, 2024
0e71615
Add option to scrape sequentially
jfilter Dec 13, 2024
2398b53
Clean up some scrapy spiders
jfilter May 22, 2025
6407161
Update PL parsing for 2022
jfilter May 22, 2025
c38e296
Get details for PL
jfilter Jun 4, 2025
46824e1
Add links to other repos
jfilter Jun 4, 2025
3b73d5b
Add scraper for pl & it for 2022
jfilter Jun 4, 2025
5ed2a7b
Add new nl spider for 2022
jfilter Jun 5, 2025
eab77aa
Add proper SE scraper
jfilter Jun 5, 2025
3cc1b23
Fix SE scraper
jfilter Jun 17, 2025
2bc897e
Add proper parsing to RO
jfilter Jun 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# EditorConfig is awesome: http://EditorConfig.org

# top-most EditorConfig file
root = true

# Tab indentation
[*]
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true

# The indent size used in the `package.json` file cannot be changed
# https://github.com/npm/npm/pull/3180#issuecomment-16336516
[{.travis.yml,npm-shrinkwrap.json,package.json}]
indent_style = space
indent_size = 4
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
data
.idea/
*.pyc
*.swp
Expand All @@ -11,4 +12,7 @@
*.jpg
*.xlsx
.ipynb_checkpoints
lv/__pycache__/
__pycache__
cache
*.zip
scrapy_fs/responses
22 changes: 8 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,13 @@
FarmSubsidy.org Scrapers
========================
# FarmSubsidy.org Scrapers

[FarmSubsidy](http://farmsubsidy.openspending.org/) is a website that collects the payment data of the Common Agriculture Policy (CAP) which represents about a third of the EU budget. It was run by a group of journalists and activists for the past years. In 2013 the [OpenSpending project](http://openspending.org/) of the [Open Knowledge Foundation](http://okfn.org/) took over responsibility of the website.
[FarmSubsidy](https://farmsubsidy.org) is a platform that collects payment data related to the EU’s Common Agricultural Policy (CAP), which accounts for approximately one-third of the EU budget. This repository focuses on the initial data collection phase, often using web scraping. However, many EU member states now offer bulk data downloads, reducing the need for scraping.

The FarmSubsidy data is mostly scraped from member state websites. The old scrapers were working well, but were running in costly and proprietary software. This year we need Free and Open Source scrapers and this repository will collect these scrapers and coordinate the effort.
## Related Farm Subsidy repositories

Please have a look at the [member state scraper issues](https://github.com/openspending/farmsubsidy-scrapers/issues?labels=memberstate&page=1&state=open). If you can help provide a scraper that would be awesome.
- Backend & cleaning: <https://github.com/okfde/farmsubsidy-store>
- Website: <https://github.com/simonwoerpel/farmsubsidy.org-next>

## Resources

Developer Documentation
-----------------------

Developer documentation for both website and scrapers can be found at http://farmsubsidy.readthedocs.org.

[Member states data sites](http://ec.europa.eu/agriculture/cap-funding/beneficiaries/shared/index_en.htm)


[Financial Reports](http://ec.europa.eu/agriculture/cap-funding/financial-reports/index_en.htm)
- **[Member States Data Sites](https://agriculture.ec.europa.eu/common-agricultural-policy/financing-cap/beneficiaries_en):** Links to member states’ CAP payment data portals.
- **[Financial Reports](http://ec.europa.eu/agriculture/cap-funding/financial-reports/index_en.htm):** Summary reports on CAP funding and expenditures.
6 changes: 0 additions & 6 deletions bg/README.md

This file was deleted.

Loading