Skip to content

Adding scripts for ATLAS HEPMC Open Data handling #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zlmarshall
Copy link
Contributor

This is the first iteration on scripts required for handling of the new ATLAS event generation open data in HEPMC format. The README explains all the different scripts and files included here (or should).

A few samples have been transferred to CERN already to establish the functionality of all the scripts. Everything seems to be ok so far.

The key outstanding item (needed before anything can actually go onto the Open Data Portal) is the record ID and DOI list for all the various records that will be created. Otherwise this should be just about ready to go, at least to the QA portal for checking.

Zach Marshall added 6 commits April 21, 2025 13:36
This is the first iteration on scripts required for handling of the new
ATLAS event generation open data in HEPMC format. The README explains
all the different scripts and files included here (or should).

A few samples have been transferred to CERN already to establish the
functionality of all the scripts. Everything seems to be ok so far.

The key outstanding item (needed before anything can actually go onto
the Open Data Portal) is the record ID and DOI list for all the various
records that will be created. Otherwise this should be just about ready
to go, at least to the QA portal for checking.
Connected to ATLAS internal discussion in
https://its.cern.ch/jira/browse/CENTRPAGE-569

For now, I think this is doing the correct thing
- Adding late-requested exotics datasets as an explicit list of
  datasets, so the various parsing scripts had to be updated accordingly
- Updating production sheet and metadata requests accordingly
- Sorting keywords in metadata (looks nicer)
- Updating sample rules to appropriately handle the new exotics samples
This js file then goes into the ATLAS open data website to document the
metadata. Including it here so that everything is in one place. Because
it is really just a copy of the csv with a header and footer, not trying
to include the output here as well (just unnecessary extra files)
This sets the remaining infrastructure up. I believe it is sufficient
for the first release of a test record page. Once more production is
done, we can release more of the records.
@zlmarshall
Copy link
Contributor Author

Hi @tiborsimko ,

Thank you for the DOIs and record IDs in
cernopendata/opendata.cern.ch#3737

I've updated the script here to make use of them, and tried to build a little infrastructure so that we don't screw up the assignment or get things mixed up in the future. I've tested a bit, and things seem to be working so far.

I think this means we're ready for a test deployment on the QA instance. This is going to feel like a LOT of infrastructure for just two records, but I am hoping that all this setup means we will be able to scale from 2 to 200 without any problems.

The nominal plan (I hope you agree!) would be to get this up on the QA instance, check things over, then to actually release it onto the normal portal. We'd then let phenomenologists look at it and see how to use it, and assuming everything is kosher we would then go ahead and release the next (larger) batch.

Thanks again,
Zach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant