Skip to content

Conversation

@juanrmn
Copy link
Member

@juanrmn juanrmn commented Jun 9, 2025

Issue

Fixes #

Proposed Changes

Adding new metadata field by band as valuelabels dict of value-labels pairs. i.e:

{
  "band_1": {
    "colorinterp": "palette",
    "nodata": 0,
    "valuelabels": {
      "1": "Corn",
      "5": "Soybeans",
      "21": "Barley",
      "121": "Developed/Open Space"
    },
    "colortable": {
      "1": [255, 211, 0, 255],
      "5": [37, 111, 0, 255],
      "21": [226, 0, 124, 255],
      "121": [153, 153, 153, 255]
    }
  }
}

This dictionary can be obtained from two sources:

1. Via command line parameter

It can be by adding the --band-valuelabels:

$ carto bigquery upload --help
[...]
 --band-valuelabels TEXT         Custom data for valuelabels in JSON format,
                                  or 'None' to use the RAT if present. i.e:
                                  '{<value_1>: <label_1>, <value_2>:
                                  <label_2>, ...}'. Could repeat --band-
                                  valuelabels to specify multiple bands data.
                                  They will be considered in the order they
                                  appear in the file. Note that you can set
                                  any value to 'None' to use the RAT for the
                                  corresponding band. Also see --rat-
                                  valuelabels-mode parameter.

For example (Note the Using the provided valuelabels for band 1 in the output:

carto bigquery upload --file_path Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif --project cartodb-on-gcp-backend-team --dataset juanra --table Annual_NLCD_LndCov_2023_CU_C1V0_Custom --overwrite --token ya29.XXX --band-valuelabels '{"90": "Woody Wetlands", "52": "Shrub/Scrub", "12": "Perennial Ice/Snow", "81": "Pasture/Hay", "11": "Open Water", "43": "Mixed Forest", "71": "Grassland/Herbaceous", "42": "Evergreen Forest", "21": "Developed, Open Space", "23": "Developed, Medium Intensity", "22": "Developed, Low Intensity", "24": "Developed High Intensity", "41": "Deciduous Forest", "82": "Cultivated Crops", "31": "Barren Land (Rock/Sand/Clay)"}'

Preparing to upload raster file to BigQuery...
File Path: Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif
File Size: 0.04869270324707031 MB
Number of bands: 1
Band types: ('uint8',)
Band sizes (MB): [1.5]
Source Band: (1,)
Band Name: [None]
Number of Blocks: 6
Block Dims: (512, 512)
Project: cartodb-on-gcp-backend-team
Dataset: juanra
Table: Annual_NLCD_LndCov_2023_CU_C1V0_Custom
Number of Records Per BigQuery Append: 10000
Compress: False
Uploading Raster to BigQuery
Loading raster file to BigQuery...
Sampling raster...
Computing approximate stats...
Computing quantiles...
Computing most common values...
Using the provided valuelabels for band 1
Writing 6 blocks and 3 overview tiles to BigQuery...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:07<00:00,  1.16it/s]
Number of empty blocks:  0
Writing metadata to BigQuery...
Updating labels...
Done.
Raster file uploaded to Google BigQuery

2. Raster Attribute Table (RAT) from the tiff file

For a file-based raster dataset, the raster attribute table must be at the same directory level as the raster, using the same name as the raster but with either the .vat.dbf or .aux.xml format (and extension). For example, for raster SanDiego.tif, the raster attribute table must be SanDiego.tif.vat.dbf or SanDiego.tif.aux.xml in the same folder.

$ carto bigquery upload --help
[...]
  --rat-valuelabels-mode [auto|interactive]
                                  The Raster Attribute Table (RAT) will be
                                  used for valuelabels if it's present. If
                                  'auto' (default), two columns will be chosen
                                  automatically for Values and Labels based on
                                  their content. If 'interactive', the user
                                  will be prompted to select the columns.

For example, having the files:

Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif
Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif.vat.dbf
carto bigquery upload --file_path Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif --project cartodb-on-gcp-backend-team --dataset juanra --table Annual_NLCD_LndCov_2023_CU_C1V0_RAT --overwrite --token ya29.XXX

Preparing to upload raster file to BigQuery...
File Path: Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif
File Size: 0.04869270324707031 MB
Number of bands: 1
Band types: ('uint8',)
Band sizes (MB): [1.5]
Source Band: (1,)
Band Name: [None]
Number of Blocks: 6
Block Dims: (512, 512)
Project: cartodb-on-gcp-backend-team
Dataset: juanra
Table: Annual_NLCD_LndCov_2023_CU_C1V0_RAT
Number of Records Per BigQuery Append: 10000
Compress: False
Uploading Raster to BigQuery
Loading raster file to BigQuery...
Sampling raster...
Computing approximate stats...
Computing quantiles...
Computing most common values...
Computing value labels for band 1
Available columns in Raster Attribute Table (RAT) for band 1:
	0: Value
	1: Class
	2: Red
	3: Green
	4: Blue
	5: Alpha
Selected RAT column for Values: [0: Value]
Selected RAT column for Labels: [1: Class]
Writing 6 blocks and 3 overview tiles to BigQuery...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:07<00:00,  1.18it/s]
Number of empty blocks:  0
Writing metadata to BigQuery...
Updating labels...
Done.
Raster file uploaded to Google BigQuery

And the interactive mode:

carto bigquery upload --file_path Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif --project cartodb-on-gcp-backend-team --dataset juanra --table Annual_NLCD_LndCov_2023_CU_C1V0_RAT --overwrite --token ya29.XXX --rat-valuelabels-mode interactive

Preparing to upload raster file to BigQuery...
File Path: Annual_NLCD_LndCov_2023_CU_C1V0-dbf/Annual_NLCD_LndCov_2023_CU_C1V0.tif
File Size: 0.04869270324707031 MB
Number of bands: 1
Band types: ('uint8',)
Band sizes (MB): [1.5]
Source Band: (1,)
Band Name: [None]
Number of Blocks: 6
Block Dims: (512, 512)
Project: cartodb-on-gcp-backend-team
Dataset: juanra
Table: Annual_NLCD_LndCov_2023_CU_C1V0_RAT
Number of Records Per BigQuery Append: 10000
Compress: False
Uploading Raster to BigQuery
Loading raster file to BigQuery...
Sampling raster...
Computing approximate stats...
Computing quantiles...
Computing most common values...
Computing value labels for band 1
Available columns in Raster Attribute Table (RAT) for band 1:
	0: Value
	1: Class
	2: Red
	3: Green
	4: Blue
	5: Alpha
Introduce the column index for Values for band 1 [0]: 0
Introduce the column index for Labels for band 1 [1]: 1
Selected RAT column for Values: [0: Value]
Selected RAT column for Labels: [1: Class]
Writing 6 blocks and 3 overview tiles to BigQuery...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:06<00:00,  1.36it/s]
Number of empty blocks:  0
Writing metadata to BigQuery...
Updating labels...
Done.
Raster file uploaded to Google BigQuery

Pull Request Checklist

  • I have tested the changes locally
  • I have added tests to cover my changes (if applicable)
  • I have updated the documentation (if applicable)

Additional Information

[Anything else you'd like to include.]

@juanrmn juanrmn marked this pull request as ready for review June 25, 2025 10:21
@juanrmn juanrmn requested review from cayetanobv and migurski June 25, 2025 10:22
Copy link
Member

@cayetanobv cayetanobv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general note about dependencies: This feature adds GDAL as a dependency; this is a major change, as we previously used Rasterio with the precompiled GDAL included in the Python Wheel. I understand this is because Rasterio doesn't yet include this feature. However, this entails several aspects to keep in mind:

  • We need GDAL installed on the system to install and use raster-loader. This wasn't necessary before.
  • We use two different GDAL instances in the library: the one preinstalled with Rasterio and the one installed on the system to manage RATs. The versions can be different, and this can cause problems or incompatibilities. One option is not to use the precompiled Rasterio and install it using the system's GDAL; but this complicates the installation significantly.

As soon as Rasterio releases this feature in the next version (maybe), I would consider moving to using Rasterio for RATs for homogeneity in the library: rasterio/rasterio#3252

@juanrmn juanrmn marked this pull request as draft July 8, 2025 10:32
@juanrmn
Copy link
Member Author

juanrmn commented Jul 8, 2025

Closing this one in favor of #175, including valuelabels only from the command line option.

@juanrmn juanrmn closed this Jul 8, 2025
@juanrmn juanrmn deleted the feature/sc-488704/support-labeldescription-in-raster-loader branch July 8, 2025 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants