Skip to content

Ensure publication_code, issue_code and item_code uniqueness #174

@griff-rees

Description

@griff-rees

A recent check of publication uniqueness suggests there are 76 newspaper publication_code duplicates (all just 1 other record, so a count of 2).

  • 76 same publication_code records
  • 82520 same issue_code records
  • 3670454 same item_code records

These might be cases of multiple editions of issue on the same day (following @kmcdono2 in #120), or actual duplicate records (meaning... just wrong). I think the majority of the publication_code cases are the later (and thankfully quite a few have no related issues, and by extension items):

>>> from django.db.models import QuerySet
>>> from newspaper.models import Newspaper, Issue, Item
>>> from lwmdb.utils import similar_records

>>> newspaper_same_codes: QuerySet = similar_records(Newspaper.objects.all(), check_fields=('publication_code',))
>>> issue_same_codes: QuerySet = similar_records(Issue.objects.all(), check_fields=('issue_code',))
>>> item_same_codes: QuerySet = similar_records(Item.objects.all(), check_fields=('item_code',))
>>> len(newspaper_same_codes)
76
>>> len(issue_same_codes)
81520
>>> len(item_same_codes)
3670454
>>> all(record for record in newspaper_same_codes if record['id__count'] == 2)
True
>>> all(record for record in issue_same_codes if record['id__count'] == 2)
True
>>> all(record for record in item_same_codes if record['id__count'] == 2)
True

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions