Open
Description
When running the categorisation for the 2015 MC list, there are > 600 datasets in the "Miscellaneous" category which collects those datasets that have not been directed to any existing category.
Make a new python script to study these datasets.
-
Input: the dataset list of the "Miscellaneous" category
-
Grouping: Take those dataset names (
title_lower
in the script) and group them based on the first part of the string. As a first attempt, use the following for grouping- the part of the name up to (and including) the first underscore in the name
- the part of the name up to (and including) the first "To" in the name
-
Output: a markdown output in a similar way as in the original script, i.e.
- the group "name" (i.e. the part of the name as defined above), the number of datasets in that group
- the full listing for each group