Overview

Artifacts from the report, "Toward Open Earth Science as Fast and Accessible as Natural Language" (M.Ellis et al. 2025) including:

Sample API key names and event types for natural language query translation (NER).
Prompts implemented as DSPy (2.5) [2] signatures as well as the initial templated prompts with a DSPy evaluation wrapper.
Inference-time scaling stepping stones as DSPy modules.
Evaluation metrics and code (rubric), and HugginFace-hosted evaluation data.

By all means a starting point and by no means an ending point...

Installation and Usage

Prerequisites

Use of many model providers is supported via LiteLLM. See the complete list of supported providers here.

Ensure your model provider API keys and required information is specified in the environment according to LiteLLM's specification. Examples are available for WatsonX, OpenAI, Google AI Studio, Azure and many others.

# Optionally, you can load necessary environment variables from a .env file at runtime. Example:
from dotenv import load_dotenv
load_dotenv(".env")

With that done, you can specify the model(s) you want to use as a simple string.

# Specify the model name e.g. for WatsonX
model_name = "watsonx/meta-llama/llama-3-1-8b-instruct"

# e.g. for OpenAI (WARN needs testing)
# model_name = "meta-llama/llama-3-1-8b-instruct"

Note, use of Llama models with or without this framework must respect Meta's Llama Acceptable Use Policy (available here, last checked Jan. 2025).

Installation

git clone https://github.com/NASA-IMPACT/EO-via-NLP.git
cd EO-via-NLP/
pip install -e .

Using Pre-Optimized Prompts

In a few lines, you can start using pre-optimized instructions and prompts.

from esa.online import OnlineTranslator

model_name : str = "watsonx/meta-llama/llama-3-1-8b-instruct" # See Setup
translator = OnlineTranslator(model_name) 

translator.translate("On January 15, 2023, display the flooding events in Jakarta.")
# ('{"area": "Jakarta", "date": "2023-01-15", "event_type": "flood", "error": ""}', 'The user query is looking for flooding events in Jakarta on a specific date. The area is clearly Jakarta, a physical location. The event type is "floods" as per the query. The date is also explicitly mentioned as January 15, 2023, which is a single date.')

OnlineTranslator.translate provides (a) the extracted query parameters as json as well as (b) the generated rationale in a tuple (a, b).

Running Evaluation

Make sure LiteLLM provider-specific environment variables are set and loaded. Example for WatsonX environment variables stored in .env:

import os
from dotenv import load_dotenv
load_dotenv(".env")
for var in ['WATSONX_URL', 'WATSONX_APIKEY', 'WATSONX_PROJECT_ID']:
  assert os.getenv(var), f'Missing {var}, make sure it is set in .env or the environment'

Configure the model and inference strateg(ies) ("programs").

import dspy 
from esa.modules import Map 

# For simplicity, we will use the same model to both verify and generate
model_name = "watsonx/openai/gpt-oss-120b"
map_model = dspy.LM(model_name)
map_program = Map(map_model)

# The verifier model is set using dspy.configure
verifier_model = dspy.LM(model_name)
dspy.configure(lm=verifier_model)

Configure and run the ground truth evaluation.

from esa.evaluation import GroundTruthEvaluation, Result, evaluate, save_results as save

gt_eval = GroundTruthEvaluation()

programs = [map_program] # Evaluate 1 program for simplicity; multiple programs can be evaluated together in general

# Example using the whole QA dataset and full rubric (gt_eval.all_metrics)
all_results : list[Result] = evaluate(program, gt_eval, gt_eval.all_metrics, nthreads=8)

Results can be saved in a csv file and reloaded as a pandas dataframe:

from esa.evaluation import save_results
import pandas as pd

filename = "test_all.csv"
save_results(complete_results, filename)

df = pd.read_csv(filename)

Maintenance and Contributions

The code will be maintained on-demand! Feel free to create an Issue/PR or email the authors [1].

Citation

If you use this work, please cite it using the CITATION.cff (click the GitHub cite button) or BibTex below.

@article{ellis2025oes,
  title={Toward Open Earth Science as Fast and Accessible as Natural Language},
  author={Ellis, Marquita and Gurung, Iksha and Ramasubramanian, Muthukumaran and Ramachandran, Rahul},
  journal={arXiv preprint arXiv:2505.15690},
  year={2025},
  month={May},
  doi={10.48550/arXiv.2505.15690},
  url={https://arxiv.org/abs/2505.15690}
}

References

[1] Ellis, Marquita, et al. Toward Open Earth Science as Fast and Accessible as Natural Language. arXiv preprint arXiv:2505.15690, May 2025. https://arxiv.org/abs/2505.15690.

[2] DSPy.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
docs/design-proposals		docs/design-proposals
esa		esa
notebooks/dspy-archive		notebooks/dspy-archive
prompts/map		prompts/map
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.lc02.txt		requirements.lc02.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Installation and Usage

Prerequisites

Installation

Using Pre-Optimized Prompts

Running Evaluation

Maintenance and Contributions

Citation

References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

NASA-IMPACT/EO-via-NLP

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation and Usage

Prerequisites

Installation

Using Pre-Optimized Prompts

Running Evaluation

Maintenance and Contributions

Citation

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages