Skip to content

Conversation

@streino
Copy link
Contributor

@streino streino commented Jan 6, 2026

Prep work opendatateam/udata-front-kit#961.

Example log:

Starting at Tue Jan  6 22:33:47 2026.
Processing universe topic 'test-topic'.
Found 4 organizations in grist.
Found 2 datasets matching the target universe.
Found 2 datasets currently in the universe.
Updating topic:
- Adding 1 datasets...
- Deleting 1 datasets...
Generating output file .../organizations-datasets-demo.json with 2 entries...
Found 2 dataservices matching the target universe.
Found 1 dataservices currently in the universe.
Updating topic:
- Adding 1 dataservices...
- Deleting 0 dataservices...
Generating output file .../organizations-dataservices-demo.json with 2 entries...
Found 5 bouquets with the universe tag.
Generating output file .../organizations-bouquets-demo.json with 3 entries...
Done at Tue Jan  6 22:33:47 2026 in 0 seconds.

In verbose:

Starting at Tue Jan  6 22:32:05 2026.
Processing universe topic 'test-topic'.
Fetching grist universe definition...
Found 4 organizations in grist.
Fetching target datasets...
Fetching datasets for organization organization-3...
Fetching datasets for organization organization-4...
Fetching datasets for organization organization-1...
Fetching datasets for organization organization-5...
Found 2 datasets matching the target universe.
Fetching existing datasets...
Found 2 datasets currently in the universe.
Computing universe updates...
Updating topic:
- Adding 1 datasets...
- Deleting 1 datasets...
Generating output file .../organizations-datasets-demo.json with 2 entries...
Fetching target dataservices...
Fetching dataservices for organization organization-3...
Fetching dataservices for organization organization-4...
Fetching dataservices for organization organization-1...
Fetching dataservices for organization organization-5...
Found 2 dataservices matching the target universe.
Fetching existing dataservices...
Found 1 dataservices currently in the universe.
Computing universe updates...
Updating topic:
- Adding 1 dataservices...
- Deleting 0 dataservices...
Generating output file .../organizations-dataservices-demo.json with 2 entries...
Fetching additional organizations from bouquets...
Found 5 bouquets with the universe tag.
Generating output file .../organizations-bouquets-demo.json with 3 entries...
Done at Tue Jan  6 22:32:05 2026 in 0 seconds.

@streino streino requested a review from abulte January 6, 2026 18:49
json.dump([asdict(o) for o in orgs], f, indent=2, ensure_ascii=False)


def get_target_universe(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_target_universe(
def get_target_universe_objects(

Needs to be more descriptive, not sure about my suggestion.

t_count[element_class] += len(object_ids)
active_orgs[element_class].append(org)
new_object_ids += object_ids
verbose_print(f"Fetching target {element_class.value}...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been confused by the "target" naming. Makes sense after scratching my head but somehow, target could be the universe topic too (the one we're writing to)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True... I didn't like "new" for about the same reason (could be we're creating a new topic).
How about "intended", "upcoming", or "goal"? Any better idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's "expected" too, but I', worried about potential interference with tests (so avoiding actual/expected).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like upcoming, goes well with the flow.

f"Found {len(target_object_ids)} {element_class.value} matching the target universe."
)

# TODO: don't run if reset==True?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but pretty cheap, doesn't matter IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can even be a feature: the log will confirm the topic is indeed empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter for perf for sure, but I was more thinking about code intent/readability.
You're right though that confirmation is a good idea, if only because we could have something added back to the topic between reset and here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re readability, it probably reads better without another if.

verbose_print("Computing topic updates...")
additions = sorted(set(target_object_ids) - set(existing_object_ids))
removals = sorted(set(existing_object_ids) - set(target_object_ids))
if n := len(removals) > REMOVALS_THRESHOLD:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if n := len(removals) > REMOVALS_THRESHOLD:
if (n := len(removals)) > REMOVALS_THRESHOLD:

print(f"- Adding {len(additions)} {element_class.value}...")
datagouv.put_topic_elements(conf.topic, element_class, additions, ADDITIONS_BATCH_SIZE)

# TODO: don't run if reset==True?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, cheap and log will be consistent.

Comment on lines +177 to +180
# FIXME: remove when front uses the new file path
# retrocompatibility
copyfile(
f"dist/organizations-datasets-{conf.env}.json", f"dist/organizations-{conf.env}.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants