We have no standardized API to get or set the unit of a numeric field, which may crucial to be able to process it correctly. Users' workarounds may be to add a suffix like "_m" or "_celcius" to the field name, use the field comment to add the unit as free text, add an extra field "foo_unit" whose value is the unit name, or give up.
That said very few formats have built-in ways of communicating that information. I can only think of:
-
netCDF: through the units attribute of a variable (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.13/cf-conventions.html#units)
-
IHO S-101: the XML feature catalog contains <S100FC:uom> elements for simple attributes that have a unit
-
some shapefiles (coming from Esri software) may be accompanied by a .shp.xml side-car file containing FGDC (Federal Geographic Data Committee) metadata. For example https://pubs.usgs.gov/sim/2840/DATA/bathy/bathy.shp.xml with the <attrunit> element:
<attr>
<attrlabl>CONTOUR</attrlabl>
<attrdef>Depth of bathymetric contour. Values are in meters above sea level, so all values are negative. Double of width 13, but values are integer. Another attribute "INTERVAL" contains the same data stored as an integer.</attrdef>
<attrdomv>
<rdom>
<rdommin>-270</rdommin>
<rdommax>-5</rdommax>
<attrunit>meters</attrunit>
<attrmres>.000</attrmres>
</rdom>
</attrdomv>
<attrmfrq>5 m</attrmfrq>
<attrtype Sync="TRUE">Number</attrtype>
<attwidth Sync="TRUE">9</attwidth>
<atnumdec Sync="TRUE">3</atnumdec>
<attrdefs>USGS</attrdefs>
</attr>
That said looking a bit at online Esri documentation it doesn't seem their GUI makes it easy to define a unit for a field.
It would seem desirable to me that the GDAL API helps unlocking that restriction, so GIS (like QGIS) could offer such a capability to their end user. We'd particularly need to address GeoPackage and PostgreSQL cases.
For Geopackage, while extending gpkg_ prefixed system tables to add a new column to store the unit name could be done in theory, it is known to cause issues to some implementations that use automatic mapping of table structure to code. This leaves us with 2 candidates: the gpkg_metadata/gpkg_metadata_reference tables used by the metadata extension, or the gpkg_data_columns table used by the schema extension. Both are implemented by GDAL. metadata is mapped to GDAL dataset and layer metadata. schema is used to store field alias and comment, as well as field domains.
For metadata , that could look like:
INSERT INTO gpkg_metadata (id, md_scope, md_standard_uri, mime_type, metadata)
VALUES (1, 'attributeType', 'http:/gdal.org', 'text/xml',
'<FieldDefinition><UnitName>meter</UnitName></FieldDefinition>');
INSERT INTO gpkg_metadata_reference (reference_scope, table_name, column_name, row_id_value, timestamp, md_file_id, md_parent_id)
VALUES ('col','my_table_name','my_column_name',-1,strftime('%Y-%m-%dT%H:%M:%fZ','now'),1,NULL);
For schema , we could hijack the description field by appending a " | Unit={value}" suffix (or just "Unit={value}" if there is no user set comment/descripion). The GDAL implementation on reading would recognize it to call SetUnitName({value}), and would remove the suffix before setting the rest of the content to SetComment().
Both ways are forward & backward compatible. The schema way is simpler to implement.
For PostgreSQL, we could actually reuse the same idea than the above hijack. We already have code that translates between PostgreSQL COMMENT ON COLUMN and OGRFieldDefn::SetComment().
And also for GML and FlatGeobuf that both support comments.
Which values to put in SetUnitName() ? We could encourage the use of strings recognized by UDUnits, as done in netCDF CF, but in no way enforce it.
Happy to hear thoughts of readers of this ticket whether adding such capability makes sense and whether the above rather hacky solutions are acceptable.
I was also wondering about a more general OGRFieldDefn::SetMetadata()/GetMetadata() capability, as we have already on dataset, raster bands and layers. But I'm not sure what users would need to put in that beyond the existing information (name, type, default value, alias, comment, field domain, nullability, uniqueness) and unit.
We have no standardized API to get or set the unit of a numeric field, which may crucial to be able to process it correctly. Users' workarounds may be to add a suffix like "_m" or "_celcius" to the field name, use the field comment to add the unit as free text, add an extra field "foo_unit" whose value is the unit name, or give up.
That said very few formats have built-in ways of communicating that information. I can only think of:
netCDF: through the
unitsattribute of a variable (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.13/cf-conventions.html#units)IHO S-101: the XML feature catalog contains
<S100FC:uom>elements for simple attributes that have a unitsome shapefiles (coming from Esri software) may be accompanied by a .shp.xml side-car file containing FGDC (Federal Geographic Data Committee) metadata. For example https://pubs.usgs.gov/sim/2840/DATA/bathy/bathy.shp.xml with the
<attrunit>element:That said looking a bit at online Esri documentation it doesn't seem their GUI makes it easy to define a unit for a field.
It would seem desirable to me that the GDAL API helps unlocking that restriction, so GIS (like QGIS) could offer such a capability to their end user. We'd particularly need to address GeoPackage and PostgreSQL cases.
For Geopackage, while extending
gpkg_prefixed system tables to add a new column to store the unit name could be done in theory, it is known to cause issues to some implementations that use automatic mapping of table structure to code. This leaves us with 2 candidates: thegpkg_metadata/gpkg_metadata_referencetables used by the metadata extension, or thegpkg_data_columnstable used by the schema extension. Both are implemented by GDAL.metadatais mapped to GDAL dataset and layer metadata.schemais used to store field alias and comment, as well as field domains.For
metadata, that could look like:For
schema, we could hijack thedescriptionfield by appending a " | Unit={value}" suffix (or just "Unit={value}" if there is no user set comment/descripion). The GDAL implementation on reading would recognize it to call SetUnitName({value}), and would remove the suffix before setting the rest of the content to SetComment().Both ways are forward & backward compatible. The
schemaway is simpler to implement.For PostgreSQL, we could actually reuse the same idea than the above hijack. We already have code that translates between PostgreSQL
COMMENT ON COLUMNand OGRFieldDefn::SetComment().And also for GML and FlatGeobuf that both support comments.
Which values to put in SetUnitName() ? We could encourage the use of strings recognized by UDUnits, as done in netCDF CF, but in no way enforce it.
Happy to hear thoughts of readers of this ticket whether adding such capability makes sense and whether the above rather hacky solutions are acceptable.
I was also wondering about a more general OGRFieldDefn::SetMetadata()/GetMetadata() capability, as we have already on dataset, raster bands and layers. But I'm not sure what users would need to put in that beyond the existing information (name, type, default value, alias, comment, field domain, nullability, uniqueness) and unit.