Skip to content

Commit 733a877

Browse files
committed
Put in content for survey of multilingual names and addresses.
1 parent 34445cc commit 733a877

1 file changed

Lines changed: 169 additions & 5 deletions

File tree

22-053/sections/05-current-practices.adoc

Lines changed: 169 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
Most usages of this POI Conceptual Model standard will need to add extra _attributes_ (named properties) in a *POI_Payload* object.
66
There are a number of attributes that are likely to come up often. Among the most common are:
77

8-
* Internationalized names and addresses
8+
* Multilingual names
9+
* Addresses, including multilingual addresses
910
* Categories
1011
* Business Hours
1112
* Telephone Number
@@ -20,17 +21,20 @@ Before discussing the individual attributes, here are some of the references con
2021
Indoor Mapping Data Format (IMDF)::
2122
https://docs.ogc.org/cs/20-094/[IMDF] is an OGC Community Standard, originally developed by Apple, for indoor maps
2223
It can be used, for example, to map airports, malls, and train stations.
23-
The concept of an https://docs.ogc.org/cs/20-094/Occupant/[Occupant] is very close to that of a POI representing a business,
24-
and as a result, the modeling of various Occupant properties is directly relevant to this survey.
24+
It is a JSON format, based on GeoJSON.
25+
The concept of an https://docs.ogc.org/cs/20-094/Occupant/[Occupant] is very close to that of a POI representing a business.
2526

2627
OpenStreetMap::
2728
https://wiki.openstreetmap.org/wiki/Main_Page[OpenStreetMap] is a community-built map of the world.
2829
Some of its https://wiki.openstreetmap.org/wiki/Map_features[Primary Features] could be called POIs,
2930
and the https://wiki.openstreetmap.org/wiki/Tags[tags] of such features are similar to our attributes.
31+
The OpenStreetMap data model is a https://github.com/openstreetmap/openstreetmap-website/blob/master/db/structure.sql[database schema].
32+
Things are called _elements_ and _tags_ are used to provide the data for each element.
3033

3134
Overture Maps::
3235
https://docs.overturemaps.org/schema/[Overture Maps] is developed by a foundation as a map built on open data.
3336
It has a schema for https://docs.overturemaps.org/schema/reference/places/place/[places] that are essentially POIs.
37+
Overture uses OGC's feature model, and defines its data model schema using a JSON schema.
3438

3539
CityGML::
3640
https://www.ogc.org/standard/citygml/[CityGML] is an OGC standard for 3D city models.
@@ -45,13 +49,173 @@ Hotels are a subset of POIs but are otherwise very similar.
4549
Schema.org::
4650
https://schema.org/[Schema.org] is a set of recommended schema for modeling various things on the web.
4751
It specifies markup for various https://schema.org/Property[Properties], some of which are relevant to POIs.
52+
A primary use is for putting _microdata_ into web pages to give information to search engines.
4853

4954
XML Schema::
5055
https://www.w3.org/TR/xmlschema11-2/[XML Schema Definition Language] models a number of primitive data types,
5156
some of which (language, dates and times) are relevant to this survey.
5257

53-
=== Internationalized Names and Addresses ===
54-
58+
RFC5646::
59+
https://tools.ietf.org/html/rfc5646[RFC 5646] _Tags for Identifying Languages_ is an Internet Best Practices
60+
guide to tags for identifying natural languages.
61+
62+
=== Multilingual Names ===
63+
64+
POIs can have their names expressed differently in different natural languages:
65+
for example "la tour Eiffel" in French is "Eiffel Tower" in English and Eiffelturm in German.
66+
67+
*IMDF* https://docs.ogc.org/cs/20-094/Occupant/index.html[Occupants] have a _name_
68+
which has type https://docs.ogc.org/cs/20-094/Reference/index.html#labels[_LABELS_].
69+
LABELS are a JSON object used to express a string label in one or more langauges.
70+
The JSON object has member names that are languages, with the corresponding
71+
member values being the label in that language.
72+
For example:
73+
74+
```json
75+
name: {
76+
"en-US": "Center Pavillion",
77+
"en-GB": "Centre Pavillion"
78+
}
79+
```
80+
IMDF says that the langage member names should be a LANGUAGE_TAG, which is
81+
defined in their https://docs.ogc.org/cs/20-094/Reference[reference section]
82+
as an https://tools.ietf.org/html/rfc5646[RFC 5646] compliant language tag and sub-tag, script, and region subtag
83+
registered in the
84+
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry[IANA Language Subtag Registry].
85+
IMDF requires that language tags may not be duplicated in a LABELS.
86+
An IMDF archive comes with a https://docs.ogc.org/cs/20-094/Manifest[Manifest] containing metadata about the described venue.
87+
Among the metadata is a _language_, whose value is the _default language_ tag for the venue.
88+
There is a requirement that all LABELS must contain an entry for the default language.
89+
90+
In *OpenStreeMap*, elements are given names with a _name=_ tag, which is decribed https://wiki.openstreetmap.org/wiki/Names#Localization[here].
91+
Additionally, there is a long article on https://wiki.openstreetmap.org/wiki/Multilingual_names[Multilingual names].
92+
There can be multiple _name=_ tags for an element, each giving the name in another language.
93+
The bare _name=_ tag gives the default language name, used locally.
94+
Names in other languages use the form _name:code=_, where _code_ is
95+
a language's https://www.loc.gov/standards/iso639-2/php/code_list.php[ISO 639-1 alpha-2 code (in the second column)],
96+
or https://www.loc.gov/standards/iso639-2/php/code_list.php[ISO 639-2/T (alpha-3)] code.
97+
It is recommended that the local name be repeated with an explicit language code,
98+
so that an implementation doesn't have to guess the local language.
99+
For example:
100+
101+
```
102+
name=la tour Eiffel
103+
name:fr=la tour Eiffel
104+
name:en=Eiffel Tower
105+
name:de=Eiffelturm
106+
```
107+
108+
In *Overture Maps*, names are objects with a _primary_ member (a string), and a _common_ member
109+
which is an object that itself contains members whose names are
110+
https://en.wikipedia.org/wiki/IETF_language_tag[IETF-BCP47] language tags
111+
and whose values are strings.
112+
For example
113+
114+
```json
115+
"names": {
116+
"primary" : "Statue of Liberty",
117+
"common" : {
118+
"fr" : "Statue de la Liberté",
119+
"it" : "Statua della Libertà"
120+
}
121+
}
122+
```
123+
124+
The primary name is expected to be the name in the localized langauge, and the common names
125+
give the name in other languages.
126+
The IETF-BCP47 language codes are expected to be in the
127+
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry[IANA language subtag registry].
128+
129+
Overture Maps allows an additional object in the _names_ element: a _rules_ object.
130+
It allows for expressing such variants as "short", "alternate", or "official", and includes
131+
an explicit _language_ member and _value_ member.
132+
133+
*Google Hotels* specifies the language at the file level only:
134+
that is, the entire collection of POIs is expected to have names in a single language,
135+
and that language is given by a _language_ XML element in the https://www.gstatic.com/localfeed/local_feed.xsd[schema].
136+
The language is expected to be an http://www.w3.org/WAI/ER/IG/ert/iso639.htm#2letter[ISO 639 lowercase 2-letter language code].
137+
138+
*CityGML* and *Schema.org* appear not to have addressed the issue of multilingual names.
139+
140+
==== Addresses, including Multilingual Addresses ====
141+
142+
There are many ways of expressing addresses of POIs.
143+
And, like POI names, addresses have country, locality, and street names that are different in different languages:
144+
e.g., Spain in English is España in Spanish.
145+
146+
In *IMDF*, an https://docs.ogc.org/cs/20-094/Address/index.html[Address] is a Feature object
147+
containing a number or properties:
148+
149+
* _address_: formatted postal address, excluding suite/unit identifier, i.e. "123 E. Main Street".
150+
* _unit_: if present, a qualifying official or proprietary unit/suite designation, i.e. "2A"
151+
* _locality_: the official locality (e.g. city, town) component of the postal address
152+
* _province_: if present, Province (e.g. state, territory) component of the postal address, using
153+
https://www.iso.org/standard/72483.html[ISO 3166-2]
154+
* _country_ : country component of the postal address, using
155+
https://www.iso.org/iso-3166-country-codes.html[ISO 3166]
156+
* _postal_code_ : mail sorting code associated with the postal address
157+
* _postal_code_ext_ : mail sorting code extension associated with the postal code
158+
* _postal_code_vanity_ : mail sorting code extension associated with the postal code
159+
160+
There is nothing said about expressing the _address_ or
161+
CityGML appears not to have addressed the issue of internationalized names.
162+
_locality_ in different languages,
163+
so presumably the local language is expected for those.
164+
By using ISO standards for _province_ and _country_, those can be tranlated into other languages
165+
when converting the codes to full names.
166+
167+
In *OpenStreetMap*, addresses are assigned to elements by giving them values for various _addr:xxx=_ tags,
168+
as described in https://wiki.openstreetmap.org/wiki/Addresses[this article].
169+
The tags are similar to those used by IMDF, but more comprehensive and more structured.
170+
Consult https://wiki.openstreetmap.org/wiki/Map_features#Addresses[here] for the full list.
171+
There is an attempt to fully structure addresses, rather than leaving the street etc. as an unstructured string,
172+
though there is a fallback _addr:full=_ tag for when structuring just doesn't work.
173+
For example:
174+
175+
```
176+
addr:housenumber=1000
177+
addr:street=5th Avenue
178+
addr:city=New York
179+
addr:state=NY
180+
addr:country=US
181+
```
182+
183+
For values that can be multilingual, the tags can have a language code added to them after a colon,
184+
just as they were in the _name:code=_ tags of the previous part of this section.
185+
For example:
186+
187+
```
188+
addr:city:en=Munich
189+
addr:city_de=München
190+
```
191+
192+
In *Overture Maps*, the https://docs.overturemaps.org/schema/reference/addresses/address/[address schema]
193+
has country, postcode, street, number, and unit, and then a number of "address levels" to capture
194+
all the various levels of administrative areas that might be present, in an ordered by unlabeled way.
195+
An example is:
196+
197+
```json
198+
"properties": {
199+
"theme": "addresses",
200+
"type": "address",
201+
"version": 0,
202+
"country": "US",
203+
"address_levels": [
204+
{
205+
"value": "MA"
206+
},
207+
{
208+
"value": "NEWTON CENTRE"
209+
}
210+
],
211+
"postcode": "02459",
212+
"street": "COMMONWEALTH AVE",
213+
"number": "1000"
214+
}
215+
```
216+
217+
The note that they loosely followed the ideas of https://openaddresses.io/[OpenAddresses].
218+
It appear that they do not explicitly address the issue of multilingual address components.
55219

56220
=== Categories ===
57221

0 commit comments

Comments
 (0)