Datathon Project: config builder by forsyth2 · Pull Request #773 · E3SM-Project/zppy

forsyth2 · 2026-01-21T02:49:12Z

This PR should NOT be merged. It is to show a prototype developed for the January 2026 LLNL Datathon. See here for an earlier (unrelated) zppy "hackathon" project.

Goal: a user should be able to give a simulation data directory path and get back a zppy cfg that if not perfect, at least gives a really good starting point.

Motivating need: the zppy cfg is quite complex, with a very large number of parameters. It would therefore be useful to create custom starter cfg files for users to build on.

Architecture: This implementation consists of 3 layers, building on top of each other.

A script to extract available data from a simulation output directory. That's simulation_output_reviewer.py.
A script to generate a starting point cfg file based on the data found. That's zppy_config_generator.py.
An agentic AI question-answering system that would allow users to customize that cfg (the "answer" would be a better cfg than the starting point). Examples: a user could request "diagnostics relating to this particular physical phenomena" or "diagnostics on a hindcast during these particular years"

Layer 3 is more exploratory and the core of the Datathon challenege. Layers 1-2 will likely be cleaned up and merged into zppy as a distinct PR at a later date.

I've gone with this approach because a lot of the zppy cfg can in fact be constructed in a deterministic manner, lending itself more to a script than an AI agent. However, the more natural-language oriented layer 3 is a better fit for an AI agent, and the agent would also have a more-informed starting point rather than needing to "learn" all the rules of our data structuring.

forsyth2 · 2026-01-21T02:50:32Z

Current status (end of datathon day 1): Layer 1 is complete. Layer 2 is in-progress, nearing completion.

Layer 2 action items:

Some subtasks are missing even though I know the data is available. It seems likely to be from a bug in the Dependency logic.
Find a way to infer the various data paths
Find a better way to determine/suggest year increments.

Layer 3 action items:

Add ollama to environment and begin experimenting with it to construct the question-answering system.

forsyth2 · 2026-01-21T18:11:35Z

Layer 2 action items above implemented by second commit

forsyth2

The third commit is an initial implementation of an agentic AI workflow. run_agent.py was largely generated with Claude, with fixes added by me. I will now proceed with actually installing ollama and testing it out.

Notes from Claude on `Ollama`:

Usage Limits - Completely Unlimited! 🎉
Ollama is 100% free and open-source with NO limits:

✅ No token limits - Process as much as you want
✅ No API costs - Runs locally on your machine
✅ No internet required - After downloading models, works offline
✅ No rate limits - Run as many queries as you need
✅ Privacy - Your data never leaves your machine

Requirements:

Disk space: ~4.7GB for llama3.1:8b, ~40GB for llama3.1:70b
RAM: 8GB minimum for 8B model, 64GB for 70B model
GPU: Optional but highly recommended (10x+ faster with GPU)

Recommended for your use case:

Start with llama3.1:8b - fast, runs on most machines
If accuracy isn't perfect, try llama3.1:70b (needs more resources)
Or try llama3.2:3b if you have limited RAM

forsyth2 · 2026-01-21T19:40:20Z

+        print("Error: ollama not found. Please install ollama first:", file=sys.stderr)
+        print("  https://ollama.ai", file=sys.stderr)
+        print(
+            "  Official instructions: https://ollama.com/download. For Linux, that shows `curl -fsSL https://ollama.com/install.sh | sh`. After installation, verify that Ollama is available by running `ollama --version`",


Following directions from @tomvothecoder in https://github.com/aims-group/llnl-datathon-2026/pull/1/files:

### 3. Option B: Local LLM via Ollama (Optional) Ollama is used as a **local LLM service** for agentic workflows. > **Important:** Ollama is **not managed by Conda**. > It must be installed at the system level. Install Ollama by following the official instructions: [https://ollama.com/download](https://ollama.com/download)

Set up config builder

700bd2a

forsyth2 self-assigned this Jan 21, 2026

Fixes for config generator

ab01125

Initial implementation for AI agent

08fbe99

forsyth2 commented Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datathon Project: config builder#773

Datathon Project: config builder#773
forsyth2 wants to merge 3 commits intomainfrom
datathon-config-builder

forsyth2 commented Jan 21, 2026

Uh oh!

forsyth2 commented Jan 21, 2026 •

edited

Loading

Uh oh!

forsyth2 commented Jan 21, 2026

Uh oh!

forsyth2 left a comment

Uh oh!

forsyth2 Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

forsyth2 commented Jan 21, 2026

Uh oh!

forsyth2 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Jan 21, 2026

Uh oh!

forsyth2 left a comment

Choose a reason for hiding this comment

Notes from Claude on Ollama:

Uh oh!

forsyth2 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

forsyth2 commented Jan 21, 2026 •

edited

Loading

Notes from Claude on `Ollama`: