Chained Transliterator ID parsing

Depends on the runtime parsing discussed in https://github.com/unicode-org/icu4x/issues/3849.

Transliterators can not only be loaded by a single ID in ICU4C/J, but also through chaining a bunch of other transliterators (including filters) together. Example: `[a-z] ; [a] Remove ; Latin-Greek/BGN`. These "chains" are actually equivalent to the transform rule source obtained by applying `chain.split(";").map(|elt| format!(":: {elt} ;")).collect::<String>()`, e.g. `:: [a-z] ; :: [a] Remove ; :: Latin-Greek/BGN ;`, i.e., the same data struct can be reused (with only an overhead cost of a few empty VZVs).

This is primarily a convenience feature for runtime construction, allowing users to not have to write a dummy source file containing the mapping explained above. Because these chains use the legacy IDs, and ICU4X data uses BCP-47 IDs, the whole issue surrounding mapping legacy IDs to BCP-47 IDs applies (https://github.com/unicode-org/icu4x/issues/3891). I suggest instead of supporting these chains of legacy IDs, instead supporting chains of BCP-47 IDs. Support for this is also on the roadmap for ICU: https://unicode-org.atlassian.net/browse/ICU-22474

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chained Transliterator ID parsing #3991

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chained Transliterator ID parsing #3991

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions