Implement IANA normalizer baked data provider#251
Conversation
This removes the iana-datagen idea (at least for now, who knows), and adds a new internal tool, bakeddata, for generating baked data. Implements a baseline proof of concept, and introduces it into temporal_rs.
76b4c82 to
8c87948
Compare
|
Do we really need to commit the whole tzdata to the repo? I would expect our datagen tool to fetch that from its repo instead. |
|
I was more thinking of adding that as a feature after some iterations. At the very least, I would prefer to have the baseline files so that this can be built without having to go over the network. Plus, it's good reference data. Although, I will admit. We can probably trim down from what is currently there. |
|
Even then, the tool can just point to a certain directory and ask for downloading the data manually, without having to fetch it from the network. The problem with vendoring the whole data is that we can always forget to delete it if we don't need it anymore, or forget to exclude it from the |
|
I don't think it would be excluded on publish in that scenario... I can make the change. But we will lose the tests in provider crate as a result, but maybe those can be added back in as we build out. |
Wait, why do we lose tests if we don't bundle the data? In my head, the process of implementing a provider would be:
Is this roughly what the PR does? Or am I misunderstanding something? |
|
No, that's exactly what the PR does. I was able to preserve the test by using the singleton to test the generated file. Beforehand, since |
That's what I'm not getting. If the data pipeline is clearly delimited in two steps, I don't understand why we would need to "build the data" again, if we already have the data built by the baked tool. Can't we just pull the data built to do tests? |
|
Currently |
We can. Testing against the baked singleton is the current approach. I'm mostly being nitpicky that we are testing against the singleton and not explicitly against the struct itself. |
|
Ahhhh, got it.
If we don't want to depend on the data itself for tests, we can do the same as ICU4X and create a struct that "consumes" the data instead of using the data directly. That way, we can use some custom testing data instead of the baked data for testing. |
|
Yeah, that's probably a good approach that can be iterated on in follow ups |
|
Just realized that the debug companion file was missing due to the gitignore. |
|
Actually this doesn't totally close #232, but it is a step closer. I'm still fine with merging this as and fixing it with subsequent follow ups. EDIT: Actually supporting #232 is definitely going to require more PRs, and a bit of a scope creep as we will probably have to implement our own zoneinfo parser. So moving ahead with this in the short term is most likely the best option (although, I'm going to see how ICU4X is supporting the zone.tab file). |
Already in the works 😉 |
This PR begins the initial work for implementing baked data providers.
There is probably going to be a lot more work to be done on this topic overall (especially for tzif support). This is meant as a minimum proof of concept that adds support for IANA identifier normalization, and adds that support to
FsTzdbProvider.List of changes:
temporal_providercrate for sourcing zoneinfo data and bakeddata struct definitions.SINGLETON_IANA_NORMALIZERtoFsTzdbProviderGeneral points of consideration: