Skip to content

Does Globus default to partial string matching? #126

@svenrdz

Description

@svenrdz

Re: ESGF/esgf-download#98

In esgpull, the query parameter of the API is used, so that we can leverage both inclusion and exclusion.
We've noticed that with the bridge API, the matching seems to be done with partial strings, rather than exactly. This differs from previous Solr behaviour.

Solr example:

$ esgpull config api.index_node --default
[api]
index_node = "esgf-node.ipsl.upmc.fr"

$ uv run esgpull search experiment_id:amip-hist --hints experiment_id
[
  {
    "experiment_id": {
      "amip-hist": 2522
    }
  }
]

Using the bridge:

$ esgpull config api.index_node esgf-node.ornl.gov/esgf-1-5-bridge
[api]
index_node = "esgf-node.ornl.gov/esgf-1-5-bridge"

$ esgpull search experiment_id:amip-hist --hints experiment_id
[
  {
    "experiment_id": {
      "amip": 108205,
      "hist-nat": 94873,
      "hist-GHG": 89217,
      "hist-aer": 81048,
      "hist-sol": 67465,
      "hist-volc": 57438,
      "esm-hist": 56232,
      "hist-noLu": 26673,
      "hist-stratO3": 26218,
      "hist-piAer": 20015,
      "hist-totalO3": 19649,
      "hist-piNTCF": 19161,
      "amip-hist": 18933,
      "hist-1950HC": 14917,
      "hist-1950": 14312,
      "land-hist": 12495,
      "hist-CO2": 11102,
      "hist-bgc": 9405,
      "amip-p4K": 8467,
      "amip-4xCO2": 7464,
      "amip-future4K": 5713,
      "amip-piForcing": 5545,
      "amip-lfmip-pdLC": 5267,
      "amip-lfmip-rmLC": 5197,
      "amip-m4K": 4763,
      "hist-resIPO": 3269,
      "amip-p4K-lwoff": 3136,
      "hist-spAer-all": 3125,
      "amip-lwoff": 3071,
      "land-hist-cruNcep": 2424,
      "hist-resAMO": 2271,
      "hist-GHG-cmip5": 2250,
      "hist-nat-cmip5": 2250,
      "hist-aer-cmip5": 2130,
      "land-hist-princeton": 2034,
      "land-hist-altStartYear": 1286,
      "amip-a4SST-4xCO2": 1124,
      "land-hist-wfdei": 974,
      "amip-hld": 838,
      "amip-lfmip-pObs": 492,
      "amip-TIP": 360,
      "amip-TIP-nosh": 351,
      "land-hist-altLu1": 227,
      "land-hist-altLu2": 227,
      "hist-lu": 9
    }
  }
]

I also noticed we can force an exact match with quotes around the value, so I will use this fix on esgpull for now:

$ esgpull search experiment_id:'"amip-hist"' --hints experiment_id
[
  {
    "experiment_id": {
      "amip-hist": 18933
    }
  }
]

I don't know if this is to be fixed on your side, it should probably be documented it if kept like this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions