Description
In sorting, there are two layers of data: The root collation and, optionally, a language-specific tailoring overlay.
In search, there are logically three layers of data: the root for sorting, a search root overlaid on that, and then, optionally, a language-specific tailoring.
However, the implementation only admits two layers, so for each language that's supposed to reuse its sort tailoring for searching, we end up generating a search tailoring that contains a merge of a copy of the search root and a copy of the sort tailoring for the language. This is obviously bad for data size.
An obvious solution would be to allow three layers: root, search root, and search tailoring. However, this would make search perform worse, since the common case would fall back twice.
(An alternative that I'm considering for Firefox in the context of ICU4C for the time being is to omit the search root when a search tailoring exists and to use the corresponding sort tailoring as-is. That is, for the Latin-script languages that have special rules about which diacritics not to ignore in diacritic-insensitive search, one would lose the fuzziness for the Arabic and Thai scripts. And modern Hangul, but I don't understand the use case for the modern Hangul bits in the search root.)