Skip to content

CMS MC categorisation - check the "Miscellaneous" datasets #95

Open
@katilp

Description

@katilp

When running the categorisation for the 2015 MC list, there are > 600 datasets in the "Miscellaneous" category which collects those datasets that have not been directed to any existing category.

Make a new python script to study these datasets.

  • Input: the dataset list of the "Miscellaneous" category

  • Grouping: Take those dataset names (title_lower in the script) and group them based on the first part of the string. As a first attempt, use the following for grouping

    • the part of the name up to (and including) the first underscore in the name
    • the part of the name up to (and including) the first "To" in the name
  • Output: a markdown output in a similar way as in the original script, i.e.

    • the group "name" (i.e. the part of the name as defined above), the number of datasets in that group
    • the full listing for each group

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions