feat: fetch all existing datasets for diff #10

abulte · 2024-12-18T16:48:45Z

This fetches all existing datasets (including archived and private ones) from the /api/1/datasets?topic=xxx endpoint (the only one where they're available).

The idea is that the diff would be complete, we would remove the archived and private datasets from the current topic.

Thus, it's an alternative to ecolabdata/ecospheres#498.

The problem is: it's very slow (due to the size of the topic and the slowness of the v1 API). I've stopped a dry-run after one hour on demo (89k datasets, stopped during initial fetch). It might be manageable after a reset of the demo topic with ecolabdata/ecospheres#510.

streino · 2024-12-18T17:00:37Z

I'm guessing it'll still be slow given how large the topic is... Slower than it should be anyway...

Could we imagine adding options to fetch archived/private datasets to the api v2?

abulte · 2024-12-18T17:04:28Z

Could we imagine adding options to fetch archived/private datasets to the api v2?

It kinds of goes against the "philosophy" of how/what is indexed on data.gouv.fr (v2 is ES-only for datasets lists), this would be a major change. ecolabdata/ecospheres#498 will probably be easier.

feat: fetch all existing datasets for diff

ed1900e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fetch all existing datasets for diff #10

feat: fetch all existing datasets for diff #10

abulte commented Dec 18, 2024 •

edited

Loading

streino commented Dec 18, 2024

abulte commented Dec 18, 2024

feat: fetch all existing datasets for diff #10

Are you sure you want to change the base?

feat: fetch all existing datasets for diff #10

Conversation

abulte commented Dec 18, 2024 • edited Loading

streino commented Dec 18, 2024

abulte commented Dec 18, 2024

abulte commented Dec 18, 2024 •

edited

Loading