ASB-29533: Adding option to save filename checker output in alternate format (csv, fits, excel, html)#32
Conversation
| { name = "Mikulski Archive for Space Telescopes", email = "mast_contrib@stsci.edu" }, | ||
| ] | ||
| dependencies = [ | ||
| "astropy >= 7.2.0", |
There was a problem hiding this comment.
astropy and pandas are big dependencies. Since, as far as I can tell, they are needed only for the specific write methods, is it worth making them optional dependencies? or will that cause too much of a headache for users? (I don't have a strong opinion here, just posing the idea)
There was a problem hiding this comment.
Astropy will definitely be strong requirement for the metadata checker, so I'm leaning towards leaving this here. We can revisit later if others feel strongly about it!
zclaytor
left a comment
There was a problem hiding this comment.
This looks great to me, @astrojimig. I'm really glad to see other output formats added. I have added one comment about optional dependencies, but it doesn't block the merge. Thanks for doing this!
jinmiyoon
left a comment
There was a problem hiding this comment.
@astrojimig Thanks for this awesome feature. I didn't have much time to review so my review is quite surface level though I am sure you did a good job. I left a few inline comments. The only notable comment would be about FITS format. I am not fully sure about its usage for inspecting the output files. It seems at least to me a bit cumbersome to inspect all the parameters and verdicts to check. But if you have some ideas how to utilize the output in its format for inspection, would you share that in the readme or tutorial, but this request is an option.
| | `-e` or `--exclude` | File pattern to exclude from testing, for example '*.jpg' to test all files except the jpgs | None | | ||
| | `-n` or `--max_n` | Maximum number of files to check, for testing purposes. | None (all files) | | ||
| | `-db` or `--dbFile` | Name of Results database file | `results_<hlsp_name>.db` | | ||
| | `-f` or `--output_format` | Write output to alternate format. Currently supports "csv", "fits", "html" or "excel" | `db` | |
There was a problem hiding this comment.
I am not fully sure how useful fits format would be for this file. It is okay to have it an option, and yet I am curious to know how I could effectively use it for inspection.
There was a problem hiding this comment.
The fits file has two Table extensions, which contains the all the same information from the filename checker output. You can open it however you want, with Python, or with a VSCode extension, or TOPCAT, etc.
For example, in Python you can see which files failed inspection with something like this:
>>> import astropy.io.fits as fits
>>> results = fits.open('results_mct-tutorial.fits') # Open File
>>> results.info() # Print Info
Filename: results_mct-tutorial.fits
No. Name Ver Type Cards Dimensions Format
0 PRIMARY 1 PrimaryHDU 4 ()
1 FILENAMES 1 BinTableHDU 17 7R x 4C [1A, 61A, 12A, K]
2 FIELDS 1 BinTableHDU 25 58R x 8C [61A, 12A, 12A, 4A, 4A, 4A, 12A, 12A]
>>> # Print filenames which failed the check
>>> failed_files = results[1].data['filename'][results[1].data['final_verdict']=='FAIL']
>>> print(failed_files)
['hlsp_mct-tutorial_jwst_nirspec_GALAXY1_multi_v1_spec.fits']
I agree it's not the most practical format for this, but I thought that it would be a good option to include for astronomers more comfortable with fits files than any other format. I hope that helps!
This MR adds some alternative options to save the output of the filename checker. In addition to the
results.dbdb file, the output can now be saved in "csv", "fits", "excel", or "html" formats with the (optional)--output_format=flagYou can test this out in the /TUTORIAL folder by running:
The new
write_to_alternate_format()function is the main change behind this. It works by converting the database table to a pandas DataFrame and then writing the output to the specified format (which are all built into pandas).The excel and html versions of the outputs are color-coded so that "PASS" shows up as green, "FAIL" shows up as red, etc. Here's an example of what it looks like:
