Skip to content

Conversation

@damien-git
Copy link
Contributor

@damien-git damien-git commented Dec 8, 2025

See VUFIND-1811.

This PR improves the format detection for video games and datasets.

  • When 336a = "computer program", rely on 008/26 instead of the 33X fields to detect the format
  • Added detection of computer and cartographic datasets using 33X fields with rdacontent source

A reindex is needed to take advantage of the change for existing records.

@demiankatz demiankatz added this to the 11.1 milestone Dec 9, 2025
Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @damien-git, see below for one question (which it's probably safe to ignore if you disagree with any part of it...)

Copy link
Member

@demiankatz demiankatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my revised suggestion:

}
boolean computerOrCartographicDS = desc.equals("computer dataset") || desc.equals("cartographic dataset");
boolean crdOrCod = code.equals("crd") || code.equals("cod");
if (source.equals("rdacontent") && (computerOrCartographicDS || crdOrCod)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be:

Suggested change
if (source.equals("rdacontent") && (computerOrCartographicDS || crdOrCod)) {
if (computerOrCartographicDS || (source.equals("rdacontent") && crdOrCod)) {

If the description matches, we don't care about the source... but if it's a code, we do need to take the source into account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, but then shouldn't we do the same thing for "two-dimensional moving image" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I misread the code on the second time through -- which is why I was asking about checking "rdacontent" consistently/globally the first time.

I think we need to make a decision: either just bail out of the function of it's not rdacontent on the assumption that we don't really know how it should be processed, or else restrict rdacontent checks to codes, on the assumption that plain-English descriptions mean the same thing regardless of the standard being used.

I don't have strong feelings on which is better -- one is more common-sense, the other is more cautious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I prefer the common-sense one (restrict rdacontent checks to codes). It will give better results if the source is missing for some reason. I will prepare an update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks reasonable to my eye now -- but I'll leave this open for a while to allow further discussion, and I'll try to find time to do a full index of Villanova's MARC records to check for consequences (either good or bad).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants