Add more collator data and filtering to testdata; add transliterator attributes domain#7679
Add more collator data and filtering to testdata; add transliterator attributes domain#7679robertbastian wants to merge 4 commits intounicode-org:mainfrom
Conversation
| ) | ||
| }) | ||
| .with_marker_attributes_filter("numbering_system", |attrs| { | ||
| matches!(attrs.as_str(), "arab" | "beng" | "cakm" | "latn" | "thai") |
There was a problem hiding this comment.
TIL we filter this for testdata.
observation: we do not filter this for bakeddata, which is the correct behavior
There was a problem hiding this comment.
We have filters because it reduces the number of json files.
There was a problem hiding this comment.
we currently generate the numbering systems for all locales in the data. because we only have select locales in the testdata, this behaves differently between new_testing and new. should we actually filter the numbering systems by locales somehow? the code is:
icu4x/provider/source/src/decimal/mod.rs
Lines 102 to 120 in 8b31bf1
components/decimal/src/provider.rs
Outdated
| [char; 10], | ||
| #[cfg(feature = "datagen")] | ||
| attributes_domain = "numbering_system" | ||
| attributes_domain = "numbering-system" |
There was a problem hiding this comment.
Question: why did you change the casing? And in make_testdata.rs, you use _. I wasn't sure whether we should use _ or - so I started using _ on my newly added attribute domains since it seems that what we are using elsewhere.
There was a problem hiding this comment.
make_testdata was an oversight.
we use - in most ICU4X identifiers, I don't think we use _ anywhere. - is easier to type and more pleasing to read
There was a problem hiding this comment.
It's not just here, it is other places, too:
https://github.com/search?q=repo%3Aunicode-org%2Ficu4x%20attributes_domain&type=code
I don't want us to become inconsistent. I would rather make a separate PR to change all the instances at the same time, rather than switching just this one. (Please don't change all the others in this PR)
components/decimal/src/provider.rs
Outdated
| [char; 10], | ||
| #[cfg(feature = "datagen")] | ||
| attributes_domain = "numbering_system" | ||
| attributes_domain = "numbering-system" |
There was a problem hiding this comment.
It's not just here, it is other places, too:
https://github.com/search?q=repo%3Aunicode-org%2Ficu4x%20attributes_domain&type=code
I don't want us to become inconsistent. I would rather make a separate PR to change all the instances at the same time, rather than switching just this one. (Please don't change all the others in this PR)
Currently some of the testdata is only not generated because some files are not downloaded to repo source data. However, this is not what clients can/should do, all of this should be controllable through options on
SourceDataProvider, in particular marker attribute filters.Running
make_testdata.rswithSourceDataProvider::new()instead ofSourceDataProvider::new_testing()should yield the same results,new_testingis convenient for avoiding the network, but should not behave differently.