-
Couldn't load subscription status.
- Fork 1
Description
Emerges from #218 alongside #223.
We should revisit the manifest trimming script and make a decision to keep this off entirely or, more likely, make some updates to automate and improve this, potentially also automating our reporting process and any metrics of interest.
From Orion:
I had some ideas related to your suggestions:
triage step to assess the curation status of each PMID (NOT just the presence / absence of PMID)
Consider what type of checks could be included in this triage (and ideally automated), like # open-access vs # restricted, grant number = one of the grants on the CCKP, etc.
I definitely think we could approach this programmatically! The current manifest trimming script could be updated to include a synapse tableQuery call. We could set the call up to use the appropriate table syn id (maybe an option to flip between the Portal and _UNION tables), based on the value of the Component column in the input manifest. If using a _UNION table, GrantView Key values for entries with matching Pubmed Id should be consolidated to a single entry, with each set of grant numbers stored as a comma-separated list next to the single matching PMID. This matches the format of the grant numbers in manifests pre-upload (I think - is that right, @aditya-nath-sage?), which will enable us to easily compare the entries. We could also implement a function that addresses the three triage scenarios:
- Addressing situation 1 (fully curated entry exists in the selected table) - function replaces entries in the new manifest with entries in the selected table, if:
- the
Pubmed IdandGrantView Keyvalues of an entry match a row in the selected table - the matching entry is marked as
Open Accessin selected table
- the
- Addressing situation 2 (partially curated entry exists in the selected table) - entries in the new manifest will be retained (and maybe reported in a separate printout or CSV) if:
- the
Pubmed IdandGrantView Keyvalues of an entry match a row in the selected table - the matching entry is marked as
Restricted Accessin selected table
- the
- Addressing situation 3 (no entry exists in the database) - entries in the new manifest will be retained and reported in a separate CSV if:
- the
Pubmed Idvalue of an entry does not match a row in the selected table - the
GrantView Keyvalue(s) associated with the newPubmed Idmatches an entry in the Portal - Grants Merged table
- the
There are definitely metrics type things we could collect and report, potentially through a separate script:
- @aclayton555 noted # open access vs # restricted above
- # of papers with more than one grant number
- # of papers from each consortium
- Names of journals and # of articles on the CCKP for each
- All the same stuff for datasets, tools
We can get some of this from table queries, but it could be interesting to just track everything and be able to look at the change over time?
Slightly separate: maybe we could check with Savitha to see if there are any other publication metrics that would be helpful for us to run monthly. For example, I think we can check citation counts and get lists of other publications that cite publications in the database. We could use this info to see what papers are being cited, including how often publications from MC2 consortia are referencing the same publications, other publications from MC2 consortia members, etc.