Transliterator datagen should allow for slicing individual baked data transliterators

Related: https://github.com/unicode-org/icu4x/issues/3966


Currently with Transliterator, all transliterators are under the same data key, as different `und-t-blah` locales. This is hard to slice; it basically requires users to manually run datagen to get any slicing.

For blob data I'm not too worried about that: it would be nice to still have ways to slice that (https://github.com/unicode-org/icu4x/issues/3966), but I'm okay with people performing some manual slicing here, because automatic slicing would potentially have to parse the transliterators themselves[^1].

But for baked data, this is not great.


I think we can structure transliterator baked data somewhat differently: datagen can produce the following:


```rust
const DATA_TRANSLITERATOR_LATIN_HAN = ...;
const DATA_TRANSLITERATOR_LATIN_GREEK = ...;

const DATA_TRANSLITERATOR_RULES_V1: icu_provider_baked::zerotrie::Data<icu::experimental::transliterate::provider::TransliteratorRulesV1> = {
   const TRIE: _ = ...;
   const VALUES: _ = [DATA_TRANSLITERATOR_LATIN_GREEK, DATA_TRANSLITERATOR_LATIN_HAN, ...];
   ... 

}

pub mod ctors {
    pub fn new_transliterator_latin_han() -> Transliterator {
       Transliterator::new_internal(DATA_TRANSLITERATOR_LATIN_HAN, ...);
    }
}
```


Ideally, `::new_internal()` has a solution to https://github.com/unicode-org/icu4x/issues/3966, where you can pass in something like `Transliterator::new_internal(TRANSLITERATOR_LATIN_HAN, TransliteratorDeps { casemapper: Some(CaseMapper::new(), normalizer: ..., ... })`


And then the calling crate can call `pub use ctors::*` somewhere.

 [^1]: Maybe we can have a `transliterator!()` macro that embeds the transliterator string into the binary so that keyextract can pick it up and read it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transliterator datagen should allow for slicing individual baked data transliterators #6249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transliterator datagen should allow for slicing individual baked data transliterators #6249

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions