Skip to content

As a user, I want to query for documents where a specific search field exists in the documentΒ #406

@jordanpadams

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Data User

πŸ’ͺ Motivation

...so that I can find products that contain (or do not contain) a specific field value in the document.

πŸ“– Additional Details

The exists operator allows users to query for documents based on the presence (or absence) of specific fields in the document.

⚠️ Syntax Update: The original implementation used postfix notation (e.g., field exists). Based on API Working Group feedback, the syntax has been updated to prefix notation (e.g., exists field) in #727 for more intuitive usage.


Original Implementation (Postfix - Deprecated)

Syntax:

  • Check if exact field exists: pds:Target/pds:name exists
  • Check if field does NOT exist: not (pds:Investigation/pds:stop_date_time exists)
  • Regex pattern matching: "pds:Target.*" exists - matches any field starting with pds:Target
  • Complex regex patterns: ".*Bounding_Coordinates.*" exists

Updated Syntax (Prefix - See #727)

Syntax:

  • Check if exact field exists: (exists pds:Target/pds:name)
  • Check if field does NOT exist: (not exists pds:Investigation/pds:stop_date_time)
  • Regex pattern matching: (exists "pds:Target.*") - matches any field starting with pds:Target
  • Complex regex patterns: (exists ".*Bounding_Coordinates.*")

Implementation Notes:

  • For exact field names (unquoted), the query checks for that specific field in the document
  • For quoted strings, the value is treated as a Java regex pattern and matched against all known field names from the OpenSearch mapping
  • When using regex patterns, all matching field names are retrieved and an existence check is created for each
  • Returns error if regex pattern matches no known fields
  • exists returns true only if the field is present AND has a non-null/non-empty value
  • For multi-valued fields, exists returns true if at least one value is present

Use Case Examples (Updated Syntax):

  • Find all products that have bounding box coordinates: (exists cart:Bounding_Coordinates/cart:north_bounding_coordinate)
  • Find investigations that are still ongoing (no stop date): (not exists pds:Investigation/pds:stop_date_time)
  • Find products with any target field using regex: (exists "pds:Target.*")
  • Find products missing author information: (not exists pds:Citation_Information/pds:author_list)
  • Complex pattern matching: (exists ".*Coordinates.*") finds all fields containing "Coordinates"

Related: See #402 for related query operator issues.
Related: See #727 for the syntax update from postfix to prefix notation.

Acceptance Criteria

Given I am querying the registry API
When I use (exists field_name) with an exact field name in my query
Then the API returns only documents where the specified field exists and has a non-null/non-empty value

Given I am querying the registry API
When I use (not exists field_name) in my query
Then the API returns only documents where the specified field does not exist or is null/empty

Given I am querying for a multi-valued field
When I use (exists field_name) and at least one value is present
Then the API returns that document in the results

Given I want to check for fields matching a regex pattern
When I use (exists "regex_pattern") with a quoted string
Then the API:

  • Retrieves all field names from the OpenSearch mapping
  • Matches the regex against all known field names
  • Returns documents where any matched field exists with a non-null/non-empty value
  • Returns an error if the regex matches no known field names

Given I combine exists with other query operators
When I use queries like (exists pds:Target/pds:name) and pds:Target/pds:type eq "Planet"
Then the API correctly applies both conditions and returns matching documents

βš™οΈ Engineering Details

Implementation completed in PR #700 (postfix syntax):

  • Updated ANTLR4 lexer grammar (Search.g4) to support postfix EXISTS keyword with both FIELD and STRINGVAL tokens
  • Updated Antlr4SearchListener to handle existence checks:
    • Exact field name matching for unquoted field names
    • Java regex pattern matching for quoted strings
    • Retrieves mapping from OpenSearch via ProductsController.productPropertiesList() for regex matching
    • Generates OpenSearch ExistsQuery for each matched field
  • Proper handling of NOT for negation
  • Error handling when regex matches no fields

Syntax Update: See #727 for the update from postfix to prefix notation based on API Working Group feedback.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status
🏁 Done
Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions