|
| 1 | +# Multiple Language Support |
| 2 | + |
| 3 | +[Discussion](https://github.com/PixarAnimationStudios/OpenUSD-proposals/pull/55) |
| 4 | + |
| 5 | +## Summary |
| 6 | + |
| 7 | +We propose additions to USD to allow specifying the human language locale used so that content may be |
| 8 | +localized to provide language and locale context for rendered text, speech synthesis, assistive technologies, or other applications. |
| 9 | + |
| 10 | +We propose use of [BCP-47](https://www.w3.org/International/core/langtags/rfc3066bis.html) specifiers according |
| 11 | +to the [Unicode CLDR](https://cldr.unicode.org) specification, using underscores as the delimiter. |
| 12 | + |
| 13 | +We propose specifying the language as [metadata](https://openusd.org/release/glossary.html#usdglossary-metadata), |
| 14 | +or as an [attribute](https://openusd.org/release/glossary.html#attribute) |
| 15 | +on [prims](https://openusd.org/release/glossary.html#usdglossary-prim) as well as a purpose on attributes. |
| 16 | + |
| 17 | +``` |
| 18 | +def Foo( |
| 19 | + prepend apiSchemas = ["LocaleAPI"] |
| 20 | +) { |
| 21 | + uniform string locale:langue = "en_US" |
| 22 | + string text = "There's a snake in my boot" |
| 23 | + string text:fr_CA = "Il y a un serpent dans ma botte" |
| 24 | + string text:hi = "मेरे जूते में एक सांप है" |
| 25 | +} |
| 26 | +``` |
| 27 | + |
| 28 | +## Problem Statement |
| 29 | + |
| 30 | +Today, most 3D formats assume a single unspecified language across the represented content. |
| 31 | + |
| 32 | +A few changes and upcoming changes to USD increase the need to specify language: |
| 33 | + |
| 34 | +1. With Unicode support in USD, it is more attractive to people in a wider range of locales. |
| 35 | +2. Upcoming text support feels like a natural area for representing content in different locales |
| 36 | +3. USD is now used as part of interactive content (games, spatial computing), where |
| 37 | + localization for user playback and assistive technologies may be useful. |
| 38 | + |
| 39 | +Since there is no language specification, it is unclear for tooling and users how content should be interpreted |
| 40 | +when used by language-aware technologies. |
| 41 | + |
| 42 | +## Glossary of Terms |
| 43 | + |
| 44 | +- **[BCP-47](https://www.w3.org/International/core/langtags/rfc3066bis.html)** : An IETF specification |
| 45 | + for language representation that are commonly used by web standards and assistive technologies. |
| 46 | + You may be familiar with these when you visit websites that have sections marked `en-CA` or `fr` in the URL |
| 47 | +- **[Unicode CLDR](https://cldr.unicode.org)** : The Unicode expression of the BCP-47 identifiers |
| 48 | +- **Language** : The primary language. Can be subdivided further. Lowercase is recommended. |
| 49 | + e.g `en` for English and `fr` for French. |
| 50 | +- **Scripts** : An optional subdivision of Language for representation of a language in different written form. |
| 51 | + Title case is recommended. For example, `az-Cyrl` for Azerbaijani in the Cyrillic script |
| 52 | +- **Region/Territories** : An optional subdivision of Language for different regions that may share the same core |
| 53 | + language. Uppercase is recommended. For example, `en_CA` for Canadian English |
| 54 | + |
| 55 | +## Relevant Links |
| 56 | + |
| 57 | +* [W3C: Choosing Language Tags](https://www.w3.org/International/questions/qa-choosing-language-tags) |
| 58 | +* [Unicode CLDR: Picking the Right Language Identifier](https://cldr.unicode.org/index/cldr-spec/picking-the-right-language-code) |
| 59 | +* [W3C: Language Tags and Local Identifiers for the World Wide Web](https://www.w3.org/TR/ltli/) |
| 60 | +* [Unicode: Language Tag Equivalences](https://cldr.unicode.org/index/cldr-spec/language-tag-equivalences) |
| 61 | +* [Common list of Locales](https://gist.github.com/typpo/b2b828a35e683b9bf8db91b5404f1bd1) |
| 62 | +* [Apple: Choosing localization regions and Scripts](https://developer.apple.com/documentation/xcode/choosing-localization-regions-and-scripts) |
| 63 | + |
| 64 | +## Details |
| 65 | + |
| 66 | +### What would use it? |
| 67 | + |
| 68 | +This addition to USD is designed to be generic over several other schema types that might benefit from it. |
| 69 | + |
| 70 | +The primary use case is the |
| 71 | +current [Text proposal from Autodesk](https://github.com/PixarAnimationStudios/OpenUSD-proposals/tree/main/proposals/text) |
| 72 | +, where text is a really good pairing for language specification. |
| 73 | + |
| 74 | +Hypothetically, in the future, we could see it also being useful for other use cases like: |
| 75 | + |
| 76 | +- User facing assistive metadata |
| 77 | +- Texture |
| 78 | +- Geometry |
| 79 | + |
| 80 | +We do not expect these to support languages right away, but we believe this is a good future looking feature that |
| 81 | +would allow for the use of USD in specific multi-language pipelines. |
| 82 | + |
| 83 | +For this proposal, we do not require that other schemas explicitly adopt language support. We suggest that this is |
| 84 | +something that can be adopted in runtimes over time to support in conjunction with other schemas. |
| 85 | + |
| 86 | +### Language Encoding |
| 87 | + |
| 88 | +To maximize compatibility with other systems, we recommend using `BCP-47` derived locales. For use as a `purpose`, |
| 89 | +this would require the use of `_` as a delimiter , as opposed to the standard `-`. |
| 90 | + |
| 91 | +e.g. Instead of `en-CA`, we use `en_CA` |
| 92 | + |
| 93 | +This brings it closer to the derived `Unicode Common Locale Data Repository (CLDR)` standard. This is commonly used |
| 94 | +by many operating systems, programming languages and corporations. If you are on a POSIX system, this also has |
| 95 | +significant overlap with the POSIX locale standards ([ISO/IEC 15897](https://www.iso.org/standard/50707.html)). |
| 96 | + |
| 97 | +An example list of languages is provided in the relevant links section above. |
| 98 | + |
| 99 | +### Unspecified Language Fallback |
| 100 | + |
| 101 | +In the event that a language is not specified, it is recommended to specify a fallback behaviour. |
| 102 | + |
| 103 | +Our recommendation is: |
| 104 | + |
| 105 | +1. If your attribute or prim is missing a language, check the parent hierarchy for an inherited value |
| 106 | +2. If no language is specified, and if your runtime can infer a language, it is free to do so but does not have to. |
| 107 | +3. If you cannot or chose not to infer a language, assume the user's current locale. |
| 108 | + |
| 109 | +This matches the behaviour of common assistive technologies like screen readers. |
| 110 | + |
| 111 | +### Default Metadata |
| 112 | + |
| 113 | +Most content will prescribe to one primary language, which tends to be the region of the content creator. |
| 114 | +To facilitate this, we encourage but do not require content authors to specify a language. |
| 115 | + |
| 116 | +As is the current convention, layer metadata is used for stage level hints. However the |
| 117 | +[Revise Use of Layer Metadata proposal](https://github.com/PixarAnimationStudios/OpenUSD-proposals/pull/45) |
| 118 | +suggests moving this to an applied API Schema. |
| 119 | + |
| 120 | +If we assume current conventions of a layer metadata, we recommend the following field. |
| 121 | + |
| 122 | +``` |
| 123 | +#usda 1.0 |
| 124 | +( |
| 125 | + language = "en_CA" |
| 126 | +) |
| 127 | +``` |
| 128 | + |
| 129 | +However, per the new proposal this should move to an API schema, and we'd propose the following |
| 130 | + |
| 131 | +``` |
| 132 | +def Foo( |
| 133 | + prepend apiSchemas = ["LocaleAPI"] |
| 134 | +) { |
| 135 | + uniform string locale:langue = "en_US" |
| 136 | +} |
| 137 | +``` |
| 138 | + |
| 139 | +In both scenarios, the language is inherited as the default value for every prim and attribute below it. |
| 140 | + |
| 141 | +### Attribute Purposes |
| 142 | + |
| 143 | +We take inspiration from web and application development conventions, where it is common to provide resources |
| 144 | +for multiple languages in a single context. |
| 145 | + |
| 146 | +For this we recommend that languages specification be a purpose on the attribute rather than having a single |
| 147 | +attribute language. |
| 148 | + |
| 149 | +Our recommendation is for this to be the last token in the attribute namespaces to work towards the most specific. |
| 150 | + |
| 151 | +``` |
| 152 | +def foo { |
| 153 | + string text = "Colours are awesome" |
| 154 | + string text:en_us = "Colors are awesome, but the letter U is not" |
| 155 | + string text:fr = "La couleur est géniale" |
| 156 | +} |
| 157 | +``` |
| 158 | + |
| 159 | +One advantage of this system is that you can have your translations in different layer files and referenced it in. |
| 160 | + |
| 161 | +It would be recommended that at least one version of the attribute exclude the language token, so that it can |
| 162 | +fallback to the inherited language and also be used as the fallback if a user asks for a language that has no matching |
| 163 | +languages available. |
| 164 | + |
| 165 | +#### Token Delimiter |
| 166 | + |
| 167 | +The proposal currently implicitly uses the last token as the language. |
| 168 | +It might however be preferable to be explicit about this by also prefixing `lang_` or `lang:` to the last token. |
| 169 | + |
| 170 | +This could look like one of the below |
| 171 | + |
| 172 | +``` |
| 173 | +string text:lang_en_us |
| 174 | +string text:lang:en_us |
| 175 | +``` |
| 176 | + |
| 177 | +This perhaps makes things longer, but does make it easier to discern that a token represents a language in a wider |
| 178 | +range of use cases. |
| 179 | + |
| 180 | +#### Why not Variants? |
| 181 | + |
| 182 | +Variants are also a possible solution, however we believe that this becomes difficult for systems to work with as |
| 183 | +variants are effectively unbounded. |
| 184 | + |
| 185 | +We also find in some of our use cases, that we'd want variants for the core data itself, and doing language |
| 186 | +variants per each of these variants would quickly become exponential in variant count and complexity. |
| 187 | + |
| 188 | +Purposes on attributes feel like the best match to existing paradigms in the web and app development, and easiest for |
| 189 | +systems to work with. |
| 190 | + |
| 191 | +### API Suggestions |
| 192 | + |
| 193 | +We do not recommend that OpenUSD itself include all possible language tags. However, it would be beneficial for USD |
| 194 | +to provide API to lookup languages specified in the file. |
| 195 | + |
| 196 | +This could look like the following behaviour where it returns a map of Language and attribute. |
| 197 | + |
| 198 | +``` |
| 199 | +std::map<TfToken, UsdAttribute> UsdLocaleAPI::GetLanguagePurposes(const UsdPrim& prim, TfToken attributeName) {...} |
| 200 | +``` |
| 201 | + |
| 202 | +Using the example above, a call to `GetLanguagePurposes(foo, "text")` would give |
| 203 | + |
| 204 | +- (`<Fallback or Unknown>`, foo:text) |
| 205 | +- (en_US, foo:text:en_US) |
| 206 | +- (fr, foo:fr) |
| 207 | + |
| 208 | +In this case, the `<Fallback or Unknown>` represents that it should follow the logic |
| 209 | +in the `Unspecified Language Fallback` section. |
| 210 | + |
| 211 | +I would suggest another function like |
| 212 | + |
| 213 | +``` |
| 214 | +TfToken UsdLocaleAPI::ComputeFallbackLanguage(const UsdPrim& prim) |
| 215 | +``` |
| 216 | + |
| 217 | +That would return either the inherited value or a sentinel `Unknown` value when no language is specified. |
| 218 | +Perhaps USD could have some convenience function to do user locale lookup, but I do not think that needs to be a |
| 219 | +requirement. |
| 220 | + |
| 221 | +#### Language Selection Recommendations |
| 222 | + |
| 223 | +When the requested or desired language is not represented in the set of languages within the file, there are some recommendations |
| 224 | +on how a runtime can pick a fallback option. |
| 225 | + |
| 226 | +Following the recommendations of CLDR and BCP-47, we suggest: |
| 227 | + |
| 228 | +1. If a language is available within the set of languages, pick that attribute. e.g. `en_US` matches `text:en_US` |
| 229 | +2. If a language isn't available, check for a more specific version of that language. |
| 230 | + e.g. `de_DE` matches `text:de_DE_u_co_phonebk` |
| 231 | +3. If a more specific language isn't available, then pick a less specific purpose. |
| 232 | + e.g. `en_US` matches `text:en` |
| 233 | +4. If a less specific version isn't available, take the version without any language specified. |
| 234 | + e.g. `en_US` matches `text` |
| 235 | + |
| 236 | +## Risks |
| 237 | + |
| 238 | +I do not see significant risk with this proposal. There is a potential for significantly more attributes, |
| 239 | +but the number of attributes this would apply to is fairly limited. |
| 240 | + |
| 241 | +One potential issue is that you may want to swap out geometry or assigned textures by locale too. |
| 242 | +e.g An English texture vs a French texture. This proposal would allow for that, but the risk is |
| 243 | +that support may be very renderer dependent. |
| 244 | + |
| 245 | +## Excluded Topics |
| 246 | + |
| 247 | +This API specifically does not approach other locale based data like currencies, units and Timezones. |
| 248 | +At this time, we are not sure if those other locale based metadata have a strong use case within USD. |
| 249 | + |
| 250 | +However, we suggest naming it something like `UsdLocaleAPI` such that it allows for future additions to those |
| 251 | +types of metadata. |
| 252 | + |
| 253 | + |
| 254 | + |
0 commit comments