-
Notifications
You must be signed in to change notification settings - Fork 381
[VUFIND-1811] Improved format detection for video games and datasets #4946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
demiankatz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @damien-git, see below for one question (which it's probably safe to ignore if you disagree with any part of it...)
demiankatz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my revised suggestion:
| } | ||
| boolean computerOrCartographicDS = desc.equals("computer dataset") || desc.equals("cartographic dataset"); | ||
| boolean crdOrCod = code.equals("crd") || code.equals("cod"); | ||
| if (source.equals("rdacontent") && (computerOrCartographicDS || crdOrCod)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be:
| if (source.equals("rdacontent") && (computerOrCartographicDS || crdOrCod)) { | |
| if (computerOrCartographicDS || (source.equals("rdacontent") && crdOrCod)) { |
If the description matches, we don't care about the source... but if it's a code, we do need to take the source into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, but then shouldn't we do the same thing for "two-dimensional moving image" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I misread the code on the second time through -- which is why I was asking about checking "rdacontent" consistently/globally the first time.
I think we need to make a decision: either just bail out of the function of it's not rdacontent on the assumption that we don't really know how it should be processed, or else restrict rdacontent checks to codes, on the assumption that plain-English descriptions mean the same thing regardless of the standard being used.
I don't have strong feelings on which is better -- one is more common-sense, the other is more cautious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I prefer the common-sense one (restrict rdacontent checks to codes). It will give better results if the source is missing for some reason. I will prepare an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks reasonable to my eye now -- but I'll leave this open for a while to allow further discussion, and I'll try to find time to do a full index of Villanova's MARC records to check for consequences (either good or bad).
See VUFIND-1811.
This PR improves the format detection for video games and datasets.
336a = "computer program", rely on008/26instead of the33Xfields to detect the format33Xfields withrdacontentsourceA reindex is needed to take advantage of the change for existing records.