Skip to content

As a data user, I want field discovery, search, and validation capabilitiesΒ #157

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Data User, New PDS User, Data Engineer, LLM/AI Systems

πŸ’ͺ Motivation

...so that I can discover available fields, search for field names, and validate my queries without needing to memorize complex PDS4 field naming conventions or consult external documentation.

πŸ“– Additional Details

Current Behavior:

  • βœ… fields() method allows selecting specific fields
  • ❌ No way to discover what fields are available
  • ❌ No validation of field names before query execution
  • ❌ Typos in field names result in silent failures or runtime errors
  • ❌ Must consult external PDS4 documentation

Proposed Solution:
Provide tools to discover, search, and validate field names:

# Discover available fields from sample results
available = products.has_target("Mars").list_fields(sample_size=10)
# Returns: ["lidvid", "pds:Identification_Area.pds:title", ...]

# Search for fields by keyword
products.search_fields("citation")
# Returns: ["pds:Citation_Information.pds:doi", ...]

# Search fields by category
products.search_fields(category="temporal")
# Returns: ["pds:Time_Coordinates.pds:start_date_time", ...]

# Validate field names with suggestions
products.fields(["pds:Title", "pds:start_date"]).validate()
# Suggests: "pds:Title" not found. Did you mean "pds:Identification_Area.pds:title"?

# Get field description
products.describe_field("pds:Citation_Information.pds:doi")
# Returns: {"description": "Digital Object Identifier...", "type": "string"}

# List common fields
products.common_fields(category="spatial")

Implementation Approach:

  1. list_fields(sample_size) - discover available fields via sampling
  2. search_fields(keyword, category) - keyword/category search with fuzzy matching
  3. validate() - validate field names with suggestions using RapidFuzz (existing dependency)
  4. common_fields(category) - return predefined common fields
  5. describe_field(name) - return field documentation
  6. Field categories: spatial, temporal, identification, citation, processing

Benefits:

  • Reduce need for external PDS4 documentation
  • Catch typos early with helpful suggestions
  • Lower barrier to entry for new users
  • Enable LLM/AI systems to discover fields dynamically

Related:


For Internal Dev Team To Complete

Acceptance Criteria

Given a user wants to find temporal-related fields
When I perform products.search_fields(category="temporal")
Then I expect a list of temporal field names like pds:Time_Coordinates.pds:start_date_time

βš™οΈ Engineering Details

To be filled by Engineering Node Team

πŸŽ‰ I&T

To be filled by Engineering Node Team

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status
ToDo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions