Adding scripts for ATLAS HEPMC Open Data handling #259

zlmarshall · 2025-04-21T11:39:50Z

This is the first iteration on scripts required for handling of the new ATLAS event generation open data in HEPMC format. The README explains all the different scripts and files included here (or should).

A few samples have been transferred to CERN already to establish the functionality of all the scripts. Everything seems to be ok so far.

The key outstanding item (needed before anything can actually go onto the Open Data Portal) is the record ID and DOI list for all the various records that will be created. Otherwise this should be just about ready to go, at least to the QA portal for checking.

This is the first iteration on scripts required for handling of the new ATLAS event generation open data in HEPMC format. The README explains all the different scripts and files included here (or should). A few samples have been transferred to CERN already to establish the functionality of all the scripts. Everything seems to be ok so far. The key outstanding item (needed before anything can actually go onto the Open Data Portal) is the record ID and DOI list for all the various records that will be created. Otherwise this should be just about ready to go, at least to the QA portal for checking.

Connected to ATLAS internal discussion in https://its.cern.ch/jira/browse/CENTRPAGE-569 For now, I think this is doing the correct thing

- Adding late-requested exotics datasets as an explicit list of datasets, so the various parsing scripts had to be updated accordingly - Updating production sheet and metadata requests accordingly - Sorting keywords in metadata (looks nicer) - Updating sample rules to appropriately handle the new exotics samples

This js file then goes into the ATLAS open data website to document the metadata. Including it here so that everything is in one place. Because it is really just a copy of the csv with a header and footer, not trying to include the output here as well (just unnecessary extra files)

This sets the remaining infrastructure up. I believe it is sufficient for the first release of a test record page. Once more production is done, we can release more of the records.

zlmarshall · 2025-05-05T21:08:36Z

Hi @tiborsimko ,

Thank you for the DOIs and record IDs in
cernopendata/opendata.cern.ch#3737

I've updated the script here to make use of them, and tried to build a little infrastructure so that we don't screw up the assignment or get things mixed up in the future. I've tested a bit, and things seem to be working so far.

I think this means we're ready for a test deployment on the QA instance. This is going to feel like a LOT of infrastructure for just two records, but I am hoping that all this setup means we will be able to scale from 2 to 200 without any problems.

The nominal plan (I hope you agree!) would be to get this up on the QA instance, check things over, then to actually release it onto the normal portal. We'd then let phenomenologists look at it and see how to use it, and assuming everything is kosher we would then go ahead and release the next (larger) batch.

Thanks again,
Zach

Zach Marshall added 6 commits April 21, 2025 13:36

Preferring MC16 to MC15 metadata

da1c06d

Connected to ATLAS internal discussion in https://its.cern.ch/jira/browse/CENTRPAGE-569 For now, I think this is doing the correct thing

Fixing production spreadsheet header after csv move

41db70f

Final updates for first release of a test page

38f5851

This sets the remaining infrastructure up. I believe it is sufficient for the first release of a test record page. Once more production is done, we can release more of the records.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding scripts for ATLAS HEPMC Open Data handling #259

Adding scripts for ATLAS HEPMC Open Data handling #259

zlmarshall commented Apr 21, 2025

zlmarshall commented May 5, 2025

Adding scripts for ATLAS HEPMC Open Data handling #259

Are you sure you want to change the base?

Adding scripts for ATLAS HEPMC Open Data handling #259

Conversation

zlmarshall commented Apr 21, 2025

zlmarshall commented May 5, 2025