Skip to content

Conversation

@flacombe
Copy link
Contributor

@flacombe flacombe commented Dec 3, 2025

As described in #391, daily diffs doesn't include referenced features, ways nodes and relations members in particular.

This PR intends to retrieve members that may be missing after a daily diff process.
So it is restricted on the update mode, as the OSH always provides everything referenced during init.

The process is provided with curl connecting to Overpass. It will try 3 times to pass a single query whatever the amount of feature to retrieve is, which may cause issues.
I will test that on worldwide projects to see how able it is.
Some improvements may be required prior to merge to handle errors or Overpass disruptions.

This PR also introduce another permanent table for each project which contains updated features on the last update run.
It is necessary to create them with following command:

CREATE TABLE pdm_features_...slug..._update (like pdm_features_...slug...);

A supplementary argument should be added to the configuration to reach Overpass. Members retrieval won't be activated if missing.

  "OVERPASS_URL": "https://overpass-api.de/api/interpreter",

This saves us from maintaining updated a database for the whole perimeter covered, not to mention worldwide.

Fix #391

@PanierAvide
Copy link
Collaborator

Thanks for the PR. For sure it needs more testing, I'm very doubtful on the reliability of asking Overpass many members on the fly... And I'm still not sure why we can't keep the original (not filtered) OSH or OSM.PBF files to first check them in and avoid too many Overpass calls.

@flacombe
Copy link
Contributor Author

flacombe commented Dec 4, 2025

And I'm still not sure why we can't keep the original (not filtered) OSH or OSM.PBF files to first check them in and avoid too many Overpass calls.

It's fairly simple:
OSH file is 150 GB, without index and it takes between 750GB and 1TB with indices (osm2pgsql or imposm). So let's avoid building indices on it.
OSH needs to be updated and apply changes from daily diffs, and it takes time every day.
Getting ~10k objects out of it also needs time, since it's not indexed so you have to process the whole file, every day.
Most of the data in the OSH is useless but will permanently clutter the process

Meanwhile, my world instance currently manages 6 projects with redundant data (some projects can be contained in another but currently has separated tables) holds 67 GB with indices of useful data.

Avoiding OSH file saves time and disk space, focuses on useful data.
If Overpass not suitable we could implement that from another service.

@PanierAvide
Copy link
Collaborator

I was more thinking of keeping the OSH PBF as is, and filter with osmium by ids (you can set a list of features you want to extract and have a single pass on the file). I understand the issue with storage, but that could be a suitable fallback to avoid overloading Overpass API. I guess this PR can be tested as is, and see how much data retrieval from Overpass is an issue before deciding.

@flacombe
Copy link
Contributor Author

flacombe commented Dec 4, 2025

I was more thinking of keeping the OSH PBF as is, and filter with osmium by ids (you can set a list of features you want to extract and have a single pass on the file).

You need to keep the OSH up to date to do this. If you keep the OSH as this, you will miss members that has been created between the daily diff you process and the OSH date.

I guess this PR can be tested as is, and see how much data retrieval from Overpass is an issue before deciding.

Yes, I didn't launch a complete init to see how it behaves during catch up

@flacombe
Copy link
Contributor Author

flacombe commented Dec 5, 2025

Here are the log of my tests last night:

Last OSH was built on 2025-11-24
5 projects out of 6 require to get missing members.

  • 2025-01_circuits (r/power=circuit) => 11 179 features retrieved from Overpass with 1 attempt, 7s to process
  • 2025-01_generators (nw/power=generator) => 5 610 features retrieved from Overpass in 1 attempt, 26s to process
  • 2025-01_lines (w/power=line,cable) => 16 966 features retrieved from Overpass in 1 attempt, 26s to process
  • 2025-01_plants (w/power=plant) => 2 312 features retrieved from Overpass in 2 attempts, 14s to process
  • 2025-01_substations (nw/power=substation) => 4 848 features retrieved from Overpass in 1 attempt, 10s to process

Currently, curl retry reach full success. Should we wait for problems to occur in the future to take more elaborated measures?

== Look for earliest date to process
Start processing from: 2025-11-24T00:59:46Z to 2025-12-05T07:41:48Z
== Build OSC changes with replication files
osmupdate Parameter: 2025-11-24T00:59:46Z
osmupdate Parameter: /data/files/pdm/changes.osc.gz
osmupdate: newest daily timestamp: 2025-12-05T00:00:00Z
osmupdate: daily changefile 4832: downloading
osmupdate: daily changefile 4831: 2025-12-04T00:00:00Z
osmupdate: daily changefile 4831: checking
osmupdate: daily changefile 4831: already in cache
osmupdate: daily changefile 4830: 2025-12-03T00:00:00Z
osmupdate: daily changefile 4830: checking
osmupdate: daily changefile 4830: already in cache
osmupdate: daily changefile 4829: 2025-12-02T00:00:00Z
osmupdate: daily changefile 4829: checking
osmupdate: daily changefile 4829: already in cache
osmupdate: daily changefile 4828: 2025-12-01T00:00:00Z
osmupdate: daily changefile 4828: checking
osmupdate: daily changefile 4828: already in cache
osmupdate: daily changefile 4827: 2025-11-30T00:00:00Z
osmupdate: daily changefile 4827: checking
osmupdate: daily changefile 4827: already in cache
osmupdate: daily changefile 4826: 2025-11-29T00:00:00Z
osmupdate: daily changefile 4826: checking
osmupdate: daily changefile 4826: already in cache
osmupdate: Merging changefiles.
osmupdate: daily changefile 4825: 2025-11-28T00:00:00Z
osmupdate: daily changefile 4825: checking
osmupdate: daily changefile 4825: already in cache
osmupdate: daily changefile 4824: 2025-11-27T00:00:00Z
osmupdate: daily changefile 4824: checking
osmupdate: daily changefile 4824: already in cache
osmupdate: daily changefile 4823: 2025-11-26T00:00:00Z
osmupdate: daily changefile 4823: checking
osmupdate: daily changefile 4823: already in cache
osmupdate: daily changefile 4822: 2025-11-25T00:00:00Z
osmupdate: daily changefile 4822: checking
osmupdate: daily changefile 4822: already in cache
osmupdate: daily changefile 4821: 2025-11-24T00:00:00Z
osmupdate: Merging changefiles.
osmupdate: Creating output file.
osmupdate: Keeping temporary files.
osmupdate: Completed successfully.
== Read OSC file information...
OSC file is up to 2025-12-04T23:59:33Z
== No polygon data to restrict on
-------------------------------------------------------------------

== Begin process for project 2025-01_circuits
Updating project changes from 2025-11-24T00:59:46Z to 2025-12-04T23:59:33Z
-------------------------------------------------------------------

   => [0s] Extract features from OSH (r/power=circuit)
   => [141s] Extract 747056 known features and 252 created features by their ids
   => [719s] Merging changes in one file
   => [719s] Transform changes into CSV file
   => [719s] Accumulate changes table in database
TRUNCATE TABLE
DELETE 0
  [720s] Copy features
COPY 5534
  [720s] Copy members
NOTICE:  table "pdm_members_circuits_tmp" does not exist, skipping
DROP TABLE
CREATE TABLE
COPY 24458
  [721s] Fetching missing members
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2432k    0 2309k  100  122k   637k  34624  0:00:03  0:00:03 --:--:--  671k
  [728s] 11179 features has been retrieved from overpass
COPY 11179
COPY 0
Timing is on.
CREATE INDEX
Time: 257.616 ms
INSERT 0 24458
Time: 4485.343 ms (00:04.485)
ANALYZE
Time: 2058.728 ms (00:02.059)
DROP TABLE
  [736s] Populate features
Timing is on.
CREATE INDEX
Time: 172.923 ms
INSERT 0 16713
Time: 2563.295 ms (00:02.563)
ANALYZE
Time: 2289.798 ms (00:02.290)
  [741s] Building geometries of ways
Timing is on.
UPDATE 756
Time: 1407.192 ms (00:01.407)
  [742s] Building geometries of relations
Timing is on.
UPDATE 384
Time: 1363.912 ms (00:01.364)
  [744s] Refresh changes
REFRESH MATERIALIZED VIEW
  [759s] Process usernames
DROP TABLE
NOTICE:  table "pdm_features_circuits_users" does not exist, skipping
CREATE TABLE
COPY 16713
INSERT 0 1
DROP TABLE
  [761s] Populate boundaries
Timing is on.
INSERT 0 572
Time: 1861.962 ms (00:01.862)
ANALYZE
Time: 291.619 ms
-------------------------------------------------------------------

UPDATE 1
   => [764s] Project update successful
-------------------------------------------------------------------

== Begin process for project 2025-01_generators
Updating project changes from 2025-11-24T00:59:46Z to 2025-12-04T23:59:33Z
-------------------------------------------------------------------

   => [0s] Extract features from OSH (nw/power=generator)
   => [162s] Extract features from OSH (generator:source)
   => [296s] Extract 24893401 known features and 52136 created features by their ids
   => [604s] Merging changes in one file
   => [605s] Transform changes into CSV file
   => [607s] Accumulate changes table in database
TRUNCATE TABLE
DELETE 0
  [607s] Copy features
COPY 269568
  [610s] Copy members
DROP TABLE
NOTICE:  table "pdm_members_generators_tmp" does not exist, skipping
CREATE TABLE
COPY 273236
  [611s] Fetching missing members
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  929k    0  864k  100 67136   553k  42956  0:00:01  0:00:01 --:--:--  595k
  [637s] 5610 features has been retrieved from overpass
COPY 5610
COPY 0
Timing is on.
CREATE INDEX
Time: 1435.060 ms (00:01.435)
INSERT 0 251751
Time: 18484.947 ms (00:18.485)
ANALYZE
Time: 20918.056 ms (00:20.918)
DROP TABLE
  [678s] Populate features
Timing is on.
CREATE INDEX
Time: 554.378 ms
INSERT 0 253687
Time: 57915.960 ms (00:57.916)
ANALYZE
Time: 3343.623 ms (00:03.344)
  [740s] Building geometries of ways
Timing is on.
UPDATE 51545
Time: 31969.819 ms (00:31.970)
  [772s] Refresh changes
REFRESH MATERIALIZED VIEW
  [1431s] Process usernames
NOTICE:  table "pdm_features_generators_users" does not exist, skipping
DROP TABLE
CREATE TABLE
COPY 275178
INSERT 0 6
DROP TABLE
  [1434s] Populate boundaries
Timing is on.
INSERT 0 49208
Time: 14500.162 ms (00:14.500)
ANALYZE
Time: 4183.092 ms (00:04.183)
-------------------------------------------------------------------

UPDATE 1
   => [1453s] Project update successful
-------------------------------------------------------------------

== Begin process for project 2025-01_lines
Updating project changes from 2025-11-24T00:59:46Z to 2025-12-04T23:59:33Z
-------------------------------------------------------------------

   => [0s] Extract features from OSH (w/power=line,cable)
   => [283s] Extract 26632832 known features and 6242 created features by their ids
   => [590s] Merging changes in one file
   => [591s] Transform changes into CSV file
   => [592s] Accumulate changes table in database
TRUNCATE TABLE
DELETE 0
  [592s] Copy features
COPY 90393
  [594s] Copy members
DROP TABLE
NOTICE:  table "pdm_members_lines_tmp" does not exist, skipping
CREATE TABLE
COPY 366834
  [595s] Fetching missing members
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3454k    0 3258k  100  195k  1188k  73188  0:00:02  0:00:02 --:--:-- 1259k
  [621s] 16966 features has been retrieved from overpass
COPY 16966
COPY 0
Timing is on.
CREATE INDEX
Time: 1321.772 ms (00:01.322)
INSERT 0 365956
Time: 173171.821 ms (02:53.172)
ANALYZE
Time: 33092.106 ms (00:33.092)
DROP TABLE
  [829s] Labelling transmission
Timing is on.
INSERT 0 9098
Time: 11484.708 ms (00:11.485)
ANALYZE
Time: 2070.791 ms (00:02.071)
  [843s] Labelling distribution
Timing is on.
INSERT 0 2188
Time: 1392.140 ms (00:01.392)
ANALYZE
Time: 694.763 ms
  [845s] Labelling transmission_overhead
Timing is on.
INSERT 0 8649
Time: 249.766 ms
ANALYZE
Time: 761.354 ms
  [846s] Populate features
Timing is on.
CREATE INDEX
Time: 343.189 ms
INSERT 0 107275
Time: 113710.109 ms (01:53.710)
ANALYZE
Time: 5348.289 ms (00:05.348)
  [966s] Building geometries of ways
Timing is on.
UPDATE 13274
Time: 54966.537 ms (00:54.967)
  [1021s] Refresh changes
REFRESH MATERIALIZED VIEW
  [1622s] Process usernames
DROP TABLE
NOTICE:  table "pdm_features_lines_users" does not exist, skipping
CREATE TABLE
COPY 107359
INSERT 0 11
DROP TABLE
  [1626s] Populate boundaries
Timing is on.
INSERT 0 14756
Time: 16647.981 ms (00:16.648)
ANALYZE
Time: 1731.031 ms (00:01.731)
-------------------------------------------------------------------

UPDATE 1
   => [1645s] Project update successful
-------------------------------------------------------------------

== Begin process for project 2025-01_plants
Updating project changes from 2025-11-24T00:59:46Z to 2025-12-04T23:59:33Z
-------------------------------------------------------------------

   => [0s] Extract features from OSH (w/power=plant)
   => [145s] Extract features from OSH (plant:source)
   => [156s] Extract 1463091 known features and 728 created features by their ids
   => [471s] Merging changes in one file
   => [472s] Transform changes into CSV file
   => [472s] Accumulate changes table in database
TRUNCATE TABLE
DELETE 0
  [472s] Copy features
COPY 11057
  [472s] Copy members
NOTICE:  table "pdm_members_plants_tmp" does not exist, skipping
DROP TABLE
CREATE TABLE
COPY 19701
  [473s] Fetching missing members
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27626  100   695  100 26931     78   3025  0:00:08  0:00:08 --:--:--   189
Warning: Problem : HTTP error. Will retry in 1 second. 3 retries left.
100  387k    0  361k  100 26931   187k  13962  0:00:01  0:00:01 --:--:--  200k
  [487s] 2312 features has been retrieved from overpass
COPY 2312
COPY 0
Timing is on.
CREATE INDEX
Time: 157.000 ms
INSERT 0 19473
Time: 7570.469 ms (00:07.570)
ANALYZE
Time: 2024.645 ms (00:02.025)
DROP TABLE
  [497s] Populate features
Timing is on.
CREATE INDEX
Time: 128.564 ms
INSERT 0 13137
Time: 8788.385 ms (00:08.788)
ANALYZE
Time: 2149.666 ms (00:02.150)
  [509s] Building geometries of ways
Timing is on.
UPDATE 1310
Time: 3004.922 ms (00:03.005)
  [512s] Refresh changes
REFRESH MATERIALIZED VIEW
  [539s] Process usernames
NOTICE:  table "pdm_features_plants_users" does not exist, skipping
DROP TABLE
CREATE TABLE
COPY 13369
INSERT 0 3
DROP TABLE
  [539s] Populate boundaries
Timing is on.
INSERT 0 1279
Time: 1238.414 ms (00:01.238)
ANALYZE
Time: 418.668 ms
-------------------------------------------------------------------

UPDATE 1
   => [541s] Project update successful
-------------------------------------------------------------------

== Begin process for project 2025-01_substations
Updating project changes from 2025-11-24T00:59:46Z to 2025-12-04T23:59:33Z
-------------------------------------------------------------------

   => [0s] Extract features from OSH (nw/power=substation)
   => [183s] Extract 4039601 known features and 4124 created features by their ids
   => [489s] Merging changes in one file
   => [489s] Transform changes into CSV file
   => [489s] Accumulate changes table in database
TRUNCATE TABLE
DELETE 0
  [490s] Copy features
COPY 22791
  [490s] Copy members
DROP TABLE
NOTICE:  table "pdm_members_substations_tmp" does not exist, skipping
CREATE TABLE
COPY 37976
  [491s] Fetching missing members
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  792k    0  740k  100 53420   124k   8951  0:00:05  0:00:05 --:--:--  196k
  [501s] 4848 features has been retrieved from overpass
COPY 4848
COPY 0
Timing is on.
CREATE INDEX
Time: 166.112 ms
INSERT 0 37924
Time: 21726.340 ms (00:21.726)
ANALYZE
Time: 3208.844 ms (00:03.209)
DROP TABLE
  [527s] Labelling transmission
Timing is on.
INSERT 0 1342
Time: 975.223 ms
ANALYZE
Time: 558.239 ms
  [529s] Labelling distribution
Timing is on.
INSERT 0 625
Time: 68.222 ms
ANALYZE
Time: 433.301 ms
  [529s] Labelling generation
Timing is on.
INSERT 0 72
Time: 36.606 ms
ANALYZE
Time: 468.387 ms
  [530s] Labelling industrial
Timing is on.
INSERT 0 69
Time: 46.052 ms
ANALYZE
Time: 458.415 ms
  [530s] Populate features
Timing is on.
CREATE INDEX
Time: 614.062 ms
INSERT 0 27607
Time: 35572.532 ms (00:35.573)
ANALYZE
Time: 2794.756 ms (00:02.795)
  [569s] Building geometries of ways
Timing is on.
UPDATE 5957
Time: 11181.651 ms (00:11.182)
  [581s] Refresh changes
REFRESH MATERIALIZED VIEW
  [664s] Process usernames
NOTICE:  table "pdm_features_substations_users" does not exist, skipping
DROP TABLE
CREATE TABLE
COPY 27639
INSERT 0 12
DROP TABLE
  [665s] Populate boundaries
Timing is on.
INSERT 0 12798
Time: 5197.348 ms (00:05.197)
ANALYZE
Time: 1324.540 ms (00:01.325)
-------------------------------------------------------------------

UPDATE 1
   => [672s] Project update successful
-------------------------------------------------------------------

== Begin process for project 2025-01_supports
Updating project changes from 2025-11-24T00:59:46Z to 2025-12-04T23:59:33Z
-------------------------------------------------------------------

   => [0s] Extract features from OSH (n/power=pole,tower,portal,insulator,terminal)
   => [382s] Extract 36195702 known features and 179983 created features by their ids
   => [541s] Merging changes in one file
   => [543s] Transform changes into CSV file
   => [544s] Accumulate changes table in database
TRUNCATE TABLE
DELETE 0
  [544s] Copy features
COPY 211241
  [547s] Populate features
Timing is on.
CREATE INDEX
Time: 432.146 ms
INSERT 0 211095
Time: 75139.324 ms (01:15.139)
ANALYZE
Time: 3023.950 ms (00:03.024)
  [626s] Refresh changes
REFRESH MATERIALIZED VIEW
  [1426s] Process usernames
DROP TABLE
NOTICE:  table "pdm_features_supports_users" does not exist, skipping
CREATE TABLE
COPY 211241
INSERT 0 5
DROP TABLE
  [1431s] Populate boundaries
Timing is on.
INSERT 0 226524
Time: 46420.653 ms (00:46.421)
ANALYZE
Time: 39389.872 ms (00:39.390)
-------------------------------------------------------------------

UPDATE 1
   => [1517s] Project update successful
-------------------------------------------------------------------

@PanierAvide
Copy link
Collaborator

I must admit that I'm surprised it works that well and that fast on so many features 😮

@flacombe
Copy link
Contributor Author

flacombe commented Dec 6, 2025

Results from daily update last night:

13 469 features were fetched

  • 2025-01_circuits (r/power=circuit) => 11 932 features retrieved from Overpass, 7s to process
  • 2025-01_generators (nw/power=generator) => 129 features retrieved from Overpass, 24s to process
  • 2025-01_lines (w/power=line,cable) => 1 094 features retrieved from Overpass, 30s to process
  • 2025-01_plants (w/power=plant) => 28 features retrieved from Overpass, 7s to process
  • 2025-01_substations (nw/power=substation) => 286 features retrieved from Overpass, 6s to process

Unfortunately I don't have attempts logging but 5 projects got their missing features again

@flacombe
Copy link
Contributor Author

flacombe commented Dec 8, 2025

Results from daily update last night:

10 465 features were fetched

2025-01_circuits (r/power=circuit) => 7 523 features retrieved from Overpass, 5s to process
2025-01_generators (nw/power=generator) => 2 233 features retrieved from Overpass, 34s to process
2025-01_lines (w/power=line,cable) => 546 features retrieved from Overpass, 27s to process
2025-01_plants (w/power=plant) => 48 features retrieved from Overpass, 9s to process
2025-01_substations (nw/power=substation) => 115 features retrieved from Overpass, 9s to process

Still no notable errors

@flacombe flacombe marked this pull request as ready for review December 9, 2025 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing relations members in daily diffs

2 participants