CMS MC categorisation - check the "Miscellaneous" datasets

When running the categorisation for [the 2015 MC list](https://github.com/cernopendata/data-curation/blob/master/cms-YYYY-simulated-datasets/inputs/CMS-2015-mc-datasets.txt), there are > 600 datasets in the "Miscellaneous" category which collects those datasets that have not been directed to any existing category.

Make a new python script to study these datasets.

- Input: the dataset list of the "Miscellaneous" category

- Grouping: Take those dataset names (`title_lower` in [the script](https://github.com/cernopendata/data-curation/blob/master/cms-YYYY-simulated-datasets/code/categorisation.py)) and group them based on the first part of the string. As a first attempt, use the following for grouping
  - the part of the name up to (and including) the first underscore in the name
  - the part of the name up to (and including) the first "To" in the name

- Output:  a markdown output in a similar way as in the original script, i.e.
  - the group "name" (i.e. the part of the name as defined above), the number of datasets in that group
  - the full listing for each group 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CMS MC categorisation - check the "Miscellaneous" datasets #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CMS MC categorisation - check the "Miscellaneous" datasets #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions