Artifacts from the report, "Toward Open Earth Science as Fast and Accessible as Natural Language" (M.Ellis et al. 2025) including:
- Sample API key names and event types for natural language query translation (NER).
- Prompts implemented as DSPy (2.5) [2] signatures as well as the initial templated prompts with a DSPy evaluation wrapper.
- Inference-time scaling stepping stones as DSPy modules.
- Evaluation metrics and code (rubric), and HugginFace-hosted evaluation data.
By all means a starting point and by no means an ending point...
Use of many model providers is supported via LiteLLM. See the complete list of supported providers here.
Ensure your model provider API keys and required information is specified in the environment according to LiteLLM's specification. Examples are available for WatsonX, OpenAI, Google AI Studio, Azure and many others.
# Optionally, you can load necessary environment variables from a .env file at runtime. Example:
from dotenv import load_dotenv
load_dotenv(".env")
With that done, you can specify the model(s) you want to use as a simple string.
# Specify the model name e.g. for WatsonX
model_name = "watsonx/meta-llama/llama-3-1-8b-instruct"
# e.g. for OpenAI (WARN needs testing)
# model_name = "meta-llama/llama-3-1-8b-instruct"
Note, use of Llama models with or without this framework must respect Meta's Llama Acceptable Use Policy (available here, last checked Jan. 2025).
git clone https://github.com/NASA-IMPACT/EO-via-NLP.git
cd EO-via-NLP/
pip install -e .
In a few lines, you can start using pre-optimized instructions and prompts.
from esa.online import OnlineTranslator
model_name : str = "watsonx/meta-llama/llama-3-1-8b-instruct" # See Setup
translator = OnlineTranslator(model_name)
translator.translate("On January 15, 2023, display the flooding events in Jakarta.")
# ('{"area": "Jakarta", "date": "2023-01-15", "event_type": "flood", "error": ""}', 'The user query is looking for flooding events in Jakarta on a specific date. The area is clearly Jakarta, a physical location. The event type is "floods" as per the query. The date is also explicitly mentioned as January 15, 2023, which is a single date.')
OnlineTranslator.translate provides (a) the extracted query parameters as json as well as (b) the generated rationale in a tuple (a, b).
Make sure LiteLLM provider-specific environment variables are set and loaded. Example for WatsonX environment variables stored in .env:
import os
from dotenv import load_dotenv
load_dotenv(".env")
for var in ['WATSONX_URL', 'WATSONX_APIKEY', 'WATSONX_PROJECT_ID']:
assert os.getenv(var), f'Missing {var}, make sure it is set in .env or the environment'Configure the model and inference strateg(ies) ("programs").
import dspy
from esa.modules import Map
# For simplicity, we will use the same model to both verify and generate
model_name = "watsonx/openai/gpt-oss-120b"
map_model = dspy.LM(model_name)
map_program = Map(map_model)
# The verifier model is set using dspy.configure
verifier_model = dspy.LM(model_name)
dspy.configure(lm=verifier_model)Configure and run the ground truth evaluation.
from esa.evaluation import GroundTruthEvaluation, Result, evaluate, save_results as save
gt_eval = GroundTruthEvaluation()
programs = [map_program] # Evaluate 1 program for simplicity; multiple programs can be evaluated together in general
# Example using the whole QA dataset and full rubric (gt_eval.all_metrics)
all_results : list[Result] = evaluate(program, gt_eval, gt_eval.all_metrics, nthreads=8)Results can be saved in a csv file and reloaded as a pandas dataframe:
from esa.evaluation import save_results
import pandas as pd
filename = "test_all.csv"
save_results(complete_results, filename)
df = pd.read_csv(filename)The code will be maintained on-demand! Feel free to create an Issue/PR or email the authors [1].
If you use this work, please cite it using the CITATION.cff (click the GitHub cite button) or BibTex below.
@article{ellis2025oes,
title={Toward Open Earth Science as Fast and Accessible as Natural Language},
author={Ellis, Marquita and Gurung, Iksha and Ramasubramanian, Muthukumaran and Ramachandran, Rahul},
journal={arXiv preprint arXiv:2505.15690},
year={2025},
month={May},
doi={10.48550/arXiv.2505.15690},
url={https://arxiv.org/abs/2505.15690}
}[1] Ellis, Marquita, et al. Toward Open Earth Science as Fast and Accessible as Natural Language. arXiv preprint arXiv:2505.15690, May 2025. https://arxiv.org/abs/2505.15690.
[2] DSPy.