Skip to content

Conversation

@znichollscr
Copy link
Collaborator

@ltroussellier fyi - step 1

cc @eleanororourke @JamesAnstey @bethdingley in case you want to follow along

@znichollscr
Copy link
Collaborator Author

@JamesAnstey can you take a look at https://github.com/WCRP-CMIP/Variable-Registry/blob/dr-1-2-2/DR-1-2-2-internal-consistency-issues.md please?

Are we going to have to just grind our way through these conflicts manually, or is there some trick we can play?

Copy link
Member

@wolfiex wolfiex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The correctly referenced cell methods need to be reverted.
  • Missing cell-methods need to be added here and their ids consistent with what already exists. @taylor13?
  • Changes to the variable root must first have the respective variable-root file created. This again should link the id.
  • key removals are fine.
  • Noticed that e.g. variable root is not converted to lowercase (it is an ID). I should probably fix that, bit it might as well be done here as not to cause more conflicts.

@znichollscr @JamesAnstey @matthew-mizielinski
We just need to make sure that corresponding (controlled) linked entries are consistent and exist at this stage - otherwise we will spend months trying to figure out why certain things don't exist.

See examples below highlighting each point. These have multiple occurrences, so I only commented on each once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was correct, and the resolved cell methods will give what you want https://github.com/WCRP-CMIP/Variable-Registry/blob/main/src-data/cell-method/amnla-tmnsn.json

Copy link
Collaborator Author

@znichollscr znichollscr Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I had misunderstood what you meant here (I realise you now mean these should be ids, not the actual cell methods). I will revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create an id for the cell method and register it in https://github.com/WCRP-CMIP/Variable-Registry/tree/main/src-data/cell-method
as was done by the other methods from the DR. This id should then be linked here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If creating or changing the variable root, we need to have a corresponding file. This cannot be identical to an existing variable root.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General point makes sense.

Let's see, in this case it's totally clear that the variable root which is being extracted from the DR is just completely wrong, so maybe it's less of an issue than it first appears

@znichollscr
Copy link
Collaborator Author

We just need to make sure that corresponding (controlled) linked entries are consistent and exist at this stage - otherwise we will spend months trying to figure out why certain things don't exist.

Do you have a validation script for this? (I'm not doing this validation by hand, so if you don't have a script, I'll write one)

The correctly referenced cell methods need to be reverted

In the CVs TT meeting, I was asked to make a version of the variable registry that matched DR v1.2.2 exactly. Either we need e.g. @JamesAnstey to say, "That's a waste of time, don't worry about it" or we have to accept these regressions, then undo them again once we've released a version that matches DR v1.2.2 exactly.

Missing cell-methods need to be added here and their ids consistent with what already exists. @taylor13?

Ok will automate that (pending decision about exact match to DR v.1.2.2 or not).

Changes to the variable root must first have the respective variable-root file created. This again should link the id.

As above

Noticed that e.g. variable root is not converted to lowercase (it is an ID). I should probably fix that, bit it might as well be done here as not to cause more conflicts.

Ah ok will update

@znichollscr
Copy link
Collaborator Author

@wolfiex as an FYI, my understanding is that we have one week (roughly) to make a version that matches DR v1.2.2 i.e. the deadline for https://github.com/WCRP-CMIP/Variable-Registry/milestone/1 should be September 18 or so, October 31 is way too late

@wolfiex
Copy link
Member

wolfiex commented Sep 12, 2025

@znichollscr We can change the milestone.

The "view" will match v1.2.2 exactly. The background data does not need to as long as it produces an identical view for the end user.

I can whip up a validation script once I finish restoring only office. Writing a centralised one has been on the back burner for a while, or do you need something sooner than that?

The simplest validation would be stripping the .json and checking the file exists in the repo.

@znichollscr
Copy link
Collaborator Author

The "view" will match v1.2.2 exactly. The background data does not need to as long as it produces an identical view for the end user.

What do you understand by the 'view' ? Is there already an agreed API that the DR will use (that isn't just the raw JSON)?

Writing a centralised one has been on the back burner for a while, or do you need something sooner than that?

Let me think about timelines and get back to you. It may be that it is better to do this elsewhere to spread the load.

@wolfiex wolfiex mentioned this pull request Sep 12, 2025
@wolfiex
Copy link
Member

wolfiex commented Sep 12, 2025

The DR will used the resolved JSON-LD, not the raw json files. Otherwise we are literally just copying the airtable dump, and applying a few fixes.

@znichollscr
Copy link
Collaborator Author

The DR will used the resolved JSON-LD

Is that meant to be this https://github.com/WCRP-CMIP/Variable-Registry/blob/main/Variable-Registry_variable.json ? If yes, there are still keys in there that haven't been resolved e.g. "model-realm": "universal:realm/aerosol"

Otherwise we are literally just copying the airtable dump, and applying a few fixes

Right now, yes, but in future this will be the source of truth i.e. there will be no AirTable to dump

@znichollscr
Copy link
Collaborator Author

I thought about this a bit more. I think I'm back on the same page.

So, my understanding:

  1. we need to create a view that matches DR 1.2.2
  2. I need to revert back to using IDs for cell methods, all fine
  3. then we go through, resolving various other issues in follow up MRs with clear changelog entries so it is obvious how variables have changed from DR 1.2.2
  4. along the way, we can reconsider and refine our data models
  5. as a last step, we regenerate the branded variables for all entries (easier to do all at once at the end than along the way I think). As part of this step, we add a CI check that the branded variables are created correctly so we can't end up with consistencies in main anymore. As part of this step, we might also consider just moving the branded variable mapper into this repo (the benefit of having it separate from the variable registry is not so clear to me now that I've seen how this is all meant to work)

@JamesAnstey
Copy link
Collaborator

@znichollscr, apologies for my slow reply to this thread.

  1. we need to create a view that matches DR 1.2.2

I'm not sure exactly what "view" means here. The DR software needs to be able to draw metadata of variables from the VR. I started implementing that using cmipld following the example notebook shared by @wolfiex. But DR variables sharing the same branded variable don't necessarily have consistent metadata - I wrote a script to check this, and it looks like your https://github.com/WCRP-CMIP/Variable-Registry/blob/dr-1-2-2/DR-1-2-2-internal-consistency-issues.md is showing the same thing.

So the question is then, what to do about it? Since these inconsistencies are there in DR v1.2.2, we can't have a VR that matches DR v1.2.2 exactly, but for the next DR version we can if we correct the inconsistencies. Given the timeline, my feeling is we should focus right now only on the critical ones defining a variable, which I think would be cell_methods, dimensions, units (and standard_name, but I think standard names were all consistent). And then the DR software could (at least for now) import just those critical ones when it sources variable-defining information from the VR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status
Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants