Description
Depends on the runtime parsing discussed in #3849.
Transliterators can not only be loaded by a single ID in ICU4C/J, but also through chaining a bunch of other transliterators (including filters) together. Example: [a-z] ; [a] Remove ; Latin-Greek/BGN
. These "chains" are actually equivalent to the transform rule source obtained by applying chain.split(";").map(|elt| format!(":: {elt} ;")).collect::<String>()
, e.g. :: [a-z] ; :: [a] Remove ; :: Latin-Greek/BGN ;
, i.e., the same data struct can be reused (with only an overhead cost of a few empty VZVs).
This is primarily a convenience feature for runtime construction, allowing users to not have to write a dummy source file containing the mapping explained above. Because these chains use the legacy IDs, and ICU4X data uses BCP-47 IDs, the whole issue surrounding mapping legacy IDs to BCP-47 IDs applies (#3891). I suggest instead of supporting these chains of legacy IDs, instead supporting chains of BCP-47 IDs. Support for this is also on the roadmap for ICU: https://unicode-org.atlassian.net/browse/ICU-22474