JobSpec Conversion

Was interested this weekend to try converting our open source jobspec database from the formats it comes in to Flux, and also to generate images that show the breakdown of a small subset of data.

Usage

Run the generation for some number (defaults to 100):

export GEMINI_API_KEY=xxxxxxxxxxxxxxx
# Default Gemini Pro
python generate-jobspecs.py --limit 100 --outdir ./results/gemini
python generate-jobspecs.py --limit 100 --model 'models/gemma-3-27b-it' --output ./results/gemma

This will generate the contents of results. Then validate. For validation, we basically:

Use the Flux Directives Parser to read in the batch script
If it doesn't read in due to error, we remove the line and try again, keeping track of the deleted lines
After we read it, we go through each directive and further validate
We keep track of total correct, wrong, and changed lines (to be added to total after)
We plot the histogram of accuracy and also other summary metrics

If you are using Flux in a container:

docker run -it -v $PWD/:/data fluxrm/flux-sched:jammy
cd /data
sudo apt-get update && apt-get install -y python3-pandas python3-seaborn

You'll need this installed

cd /tmp
mkdir -p ./bin
curl -fsSL https://git.io/shellmetrics > ./bin/shellmetrics
chmod +x ./bin/shellmetrics
export PATH=$PWD/bin:$PATH
cd -

cd /data
python validate-jobspecs.py

Note that this sample is biased to those that the Gemini API can parse.

Results

Here is a quick glimpse, just a small number for testing.

Applications

Of the set we parsed, these are applications. This was not a random sample (the same repo is multiple times)

Complexity

This is comparing the shellmetrics complexity to the LLM derived one, which I asked it to come up with based on criteria (see prompt in generation script). What I would guess here is that our result sample is biased to those with lower complexity.

That said, I bet we could separate the resources part of the script from the rest, and just have the resources get parsed for the conversion (and then the script plopped in) and then use the shellmetrics complexity.

Accuracy

This was a cool approach I think - I used the Flux Directive Parser to first try to load the script. If it didn't load, I saved the line and the reason why (see gemini-jobspec-to-flux.json) and deleted the line and tried again. Then I could do a second level of parsing, going through the lines and validating each one. At the end I get a total count of correct, wrong, plus deletions. The accuracy is:

accuracy = correct / (total + deletions)

The total can't include the deleted lines that had to be removed in order to parse it.

That's it - I think I'd like to try the approach I mentioned, and also Gemma instead.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
img		img
ml		ml
results		results
scripts/previous		scripts/previous
README.md		README.md
gemini-jobspec-to-flux.json		gemini-jobspec-to-flux.json
generate-jobspecs.py		generate-jobspecs.py
validate-jobspecs.py		validate-jobspecs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JobSpec Conversion

Usage

Results

Applications

Complexity

Accuracy

About

Uh oh!

Releases

Packages

Uh oh!

Languages

converged-computing/jobspec-conversion

Folders and files

Latest commit

History

Repository files navigation

JobSpec Conversion

Usage

Results

Applications

Complexity

Accuracy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages