Skip to content

Commit 6c7be9f

Browse files
authored
Initial code, set up python workflow, start script to import geojson data (#1)
Initial code, set up python workflows to import water companies and zones
1 parent 14534b5 commit 6c7be9f

39 files changed

+13050
-1244
lines changed

.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
NOCODB_API_TOKEN=

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,4 +160,5 @@ dmypy.json
160160
cython_debug/
161161

162162
# Precommit hooks: ruff cache
163-
.ruff_cache
163+
.ruff_cache
164+
data/

.pre-commit-config.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
repos:
2-
- repo: https://github.com/charliermarsh/ruff-pre-commit
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
33
# Ruff version.
4-
rev: "v0.2.1"
4+
rev: "v0.14.9"
55
hooks:
66
- id: ruff
77
args: [--fix]
88
- repo: https://github.com/pre-commit/pre-commit-hooks
9-
rev: v4.3.0
9+
rev: v6.0.0
1010
hooks:
1111
- id: check-merge-conflict
12+
- id: check-illegal-windows-names
1213
- id: mixed-line-ending

README.md

Lines changed: 16 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,29 @@
1-
# Template DataForGood
1+
# VCM Water Watch
22

3-
This file will become your README and also the index of your
4-
documentation.
3+
## Objectif
54

6-
# Contributing
5+
Créer une plateforme collaborative et scientifique pour cartographier et analyser les risques de pollution de l’eau par les CVM/VCM dans les tuyaux en PVC installés dans les années 70/80 en Europe. La plateforme permettra de visualiser les risques connus et potentiels, d’identifier les manques de données, et de stimuler la contribution citoyenne et institutionnelle à la recherche.
76

7+
## Contributing
88

9-
## Installation
9+
### Prerequisites
1010

11-
- [Installation de Python](#installation-de-python)
11+
- Python 3.12
12+
- [uv](https://docs.astral.sh/uv/) for dependency management
13+
- [just](https://just.systems/) for running tasks
14+
- [pre-commit](https://pre-commit.com/)
1215

13-
Ce projet utilise [uv](https://docs.astral.sh/uv/) pour la gestion des dépendances Python. Il est préréquis pour l'installation de ce projet.
14-
15-
Une fois installé, il suffit de lancer la commande suivante pour installer la version de Python adéquate, créer un environnement virtuel et installer les dépendances du projet.
16-
17-
```bash
18-
uv sync
19-
```
20-
21-
A l'usage, si vous utilisez VSCode, l'environnement virtuel sera automatiquement activé lorsque vous ouvrirez le projet. Sinon, il suffit de l'activer manuellement avec la commande suivante :
16+
### Installation
2217

2318
```bash
24-
source .venv/bin/activate
19+
just install
2520
```
2621

27-
Ou alors, utilisez la commande `uv run ...` (au lieu de `python ...`) pour lancer un script Python. Par exemple:
28-
29-
```bash
30-
uv run pipelines/run.py run build_database
31-
```
32-
33-
34-
## Lancer les precommit-hook localement
35-
36-
[Installer les precommit](https://pre-commit.com/)
22+
This will install:
3723

38-
pre-commit run --all-files
24+
- Python dependencies, using `uv`
25+
- pre-commit hooks, using `pre-commit`
3926

40-
## Utiliser Tox pour tester votre code
27+
### Running Python ETL
4128

42-
tox -vv
29+
See [Pipelines Documentation](pipelines/README.md)

d4g-utils/install_poetry.sh

Lines changed: 0 additions & 51 deletions
This file was deleted.

data/raw/countries.geojson

Lines changed: 265 additions & 0 deletions
Large diffs are not rendered by default.

justfile

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
_default:
2+
@just --list
3+
@echo ""
4+
@echo "Run 'just <recipe>' to execute a task."
5+
6+
# install environment and pre-commit hooks
7+
install:
8+
uv sync
9+
pre-commit install
10+
11+
# run tests for the Python pipelines
12+
pipelines-test:
13+
uv run pytest pipelines
14+
15+
mod extract 'pipelines/extract'
16+
mod transform 'pipelines/transform'
17+
mod load 'pipelines/load'
18+
mod task 'pipelines/tasks'

pipelines/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.prefect-storage/

pipelines/README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Python Workflows
2+
3+
## Python Environment
4+
5+
Install python dependencies:
6+
7+
```bash
8+
uv sync
9+
```
10+
11+
`uv` will create a Python virtual environment. IDE such as VSCode will usually find that automatically, but you can activate
12+
it manually with:
13+
14+
```bash
15+
source .venv/bin/activate
16+
```
17+
18+
or use `uv` to run commands:
19+
20+
```bash
21+
uv run python
22+
```
23+
24+
The justfile recipes from the top of the repo will do that automatically.
25+
26+
## Configuration
27+
28+
Credentials are configured in the file `.env` at the root of this repository.
29+
Copy from .env.example and update with actual values.
30+
31+
## Running an import task
32+
33+
To start a workflow for an import task, use the `just` command at the root of the repository.
34+
35+
There are 4 categories:
36+
37+
- extract: download raw data and process it into the staging directory
38+
- transform: additional processing of staging data
39+
- load: load staging data into NocoDB
40+
- tasks: additional processing on data within the database
41+
42+
Run `just` with the corresponding category to get a list, for example `just extract`.
43+
Then run a task by adding the name, for example `just extract download-municipalities`.
44+
45+
## Common Tasks
46+
47+
### Adding a water company
48+
49+
Manually in NocoDB:
50+
51+
- enter Water Company data in NocoDB: Actor
52+
- create one or more DistributionZone records, linked with the Water Company. Link the DistributionZone with corresponding
53+
municipalities
54+
- run the task to calculate the missing distribution zone geometries: `just task calculate-distribution-zone`
55+
56+
In bulk:
57+
58+
- alternatively, add data in a raw/WaterCompany_*.ndjson file, with 1 row per distribution zone. Specify the covered municipalities.
59+
- run `just transform create-distribution-zones`
60+
- run `just load zones-distribution`
61+
- run `just load water-companies`
62+
- run the task to calculate the missing distribution zone geometries: `just task calculate-distribution-zone`

0 commit comments

Comments
 (0)