Skip to content

Extending OGRFieldDefn with SetUnitName()/GetUnitName() ? #14577

@rouault

Description

@rouault

We have no standardized API to get or set the unit of a numeric field, which may crucial to be able to process it correctly. Users' workarounds may be to add a suffix like "_m" or "_celcius" to the field name, use the field comment to add the unit as free text, add an extra field "foo_unit" whose value is the unit name, or give up.

That said very few formats have built-in ways of communicating that information. I can only think of:

  • netCDF: through the units attribute of a variable (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.13/cf-conventions.html#units)

  • IHO S-101: the XML feature catalog contains <S100FC:uom> elements for simple attributes that have a unit

  • some shapefiles (coming from Esri software) may be accompanied by a .shp.xml side-car file containing FGDC (Federal Geographic Data Committee) metadata. For example https://pubs.usgs.gov/sim/2840/DATA/bathy/bathy.shp.xml with the <attrunit> element:

                          <attr>
                                  <attrlabl>CONTOUR</attrlabl>
                                  <attrdef>Depth of bathymetric contour. Values are in meters above sea level, so all values are negative. Double of width 13, but values are integer. Another attribute "INTERVAL" contains the same data stored as an integer.</attrdef>
                                  <attrdomv>
                                          <rdom>
                                                  <rdommin>-270</rdommin>
                                                  <rdommax>-5</rdommax>
                                                  <attrunit>meters</attrunit>
                                                  <attrmres>.000</attrmres>
                                          </rdom>
                                  </attrdomv>
                                  <attrmfrq>5 m</attrmfrq>
                                  <attrtype Sync="TRUE">Number</attrtype>
                                  <attwidth Sync="TRUE">9</attwidth>
                                  <atnumdec Sync="TRUE">3</atnumdec>
                                  <attrdefs>USGS</attrdefs>
                          </attr>

    That said looking a bit at online Esri documentation it doesn't seem their GUI makes it easy to define a unit for a field.

It would seem desirable to me that the GDAL API helps unlocking that restriction, so GIS (like QGIS) could offer such a capability to their end user. We'd particularly need to address GeoPackage and PostgreSQL cases.

For Geopackage, while extending gpkg_ prefixed system tables to add a new column to store the unit name could be done in theory, it is known to cause issues to some implementations that use automatic mapping of table structure to code. This leaves us with 2 candidates: the gpkg_metadata/gpkg_metadata_reference tables used by the metadata extension, or the gpkg_data_columns table used by the schema extension. Both are implemented by GDAL. metadata is mapped to GDAL dataset and layer metadata. schema is used to store field alias and comment, as well as field domains.

For metadata , that could look like:

INSERT INTO gpkg_metadata (id, md_scope, md_standard_uri, mime_type, metadata)
                   VALUES (1, 'attributeType', 'http:/gdal.org', 'text/xml',
                           '<FieldDefinition><UnitName>meter</UnitName></FieldDefinition>');
INSERT INTO gpkg_metadata_reference (reference_scope, table_name, column_name, row_id_value, timestamp, md_file_id, md_parent_id)
    VALUES ('col','my_table_name','my_column_name',-1,strftime('%Y-%m-%dT%H:%M:%fZ','now'),1,NULL);

For schema , we could hijack the description field by appending a " | Unit={value}" suffix (or just "Unit={value}" if there is no user set comment/descripion). The GDAL implementation on reading would recognize it to call SetUnitName({value}), and would remove the suffix before setting the rest of the content to SetComment().

Both ways are forward & backward compatible. The schema way is simpler to implement.

For PostgreSQL, we could actually reuse the same idea than the above hijack. We already have code that translates between PostgreSQL COMMENT ON COLUMN and OGRFieldDefn::SetComment().

And also for GML and FlatGeobuf that both support comments.

Which values to put in SetUnitName() ? We could encourage the use of strings recognized by UDUnits, as done in netCDF CF, but in no way enforce it.

Happy to hear thoughts of readers of this ticket whether adding such capability makes sense and whether the above rather hacky solutions are acceptable.

I was also wondering about a more general OGRFieldDefn::SetMetadata()/GetMetadata() capability, as we have already on dataset, raster bands and layers. But I'm not sure what users would need to put in that beyond the existing information (name, type, default value, alias, comment, field domain, nullability, uniqueness) and unit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementnot for AI loversSee https://gdal.org/en/stable/community/ai_tool_policy.html

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions