Skip to content

Commit 82719c1

Browse files
authored
Merge pull request #185 from MITLibraries/tco-122-associate-suggested-resource-and-term
Refactor SuggestedResource to leverage new Fingerprint model
2 parents 581acd7 + 2d2cfbc commit 82719c1

40 files changed

+572
-410
lines changed

README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,13 @@ config:
4747
webroot: .
4848
```
4949
50+
We use Lando here because its use in our WordPress environment. However, any static local webserver will work.
51+
5052
If you need to regenerate these cassettes, the following procedure should be sufficient:
5153
52-
1. Use the configuration above to ensure the needed files are visible at `http://static.lndo.site/filename.ext`.
54+
1. Use the configuration above to ensure the needed files are visible at `http://static.lndo.site/filename.ext` (i.e.,
55+
run `lando start` in `tacos/test/fixtures/files`). If you are using a server other than Lando, configure it such that
56+
`tacos/test/fixtures/files` is the root directory, then start the server.
5357
2. Delete any existing cassette files which need to be regenerated.
5458
3. Run the test(s).
5559
4. Commit the resulting files along with your other work.

app/dashboards/detector/suggested_resource_dashboard.rb

-75
This file was deleted.

app/models/detector/suggested_resource.rb

+3-86
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,10 @@
11
# frozen_string_literal: true
22

3-
# == Schema Information
4-
#
5-
# Table name: detector_suggested_resources
6-
#
7-
# id :integer not null, primary key
8-
# title :string
9-
# url :string
10-
# phrase :string
11-
# fingerprint :string
12-
# created_at :datetime not null
13-
# updated_at :datetime not null
14-
#
15-
163
require 'stringex/core_ext'
174

185
class Detector
19-
# Detector::SuggestedResource stores custom hints that we want to send to the
20-
# user in response to specific strings. For example, a search for "web of
21-
# science" should be met with our custom login link to Web of Science via MIT.
22-
class SuggestedResource < ApplicationRecord
23-
before_save :update_fingerprint
24-
25-
def self.table_name_prefix
26-
'detector_'
27-
end
28-
29-
# This exists for the before_save lifecycle hook to call the calculate_fingerprint method, to ensure that these
30-
# records always have a correctly-calculated fingerprint. It has no arguments and returns nothing.
31-
def update_fingerprint
32-
self.fingerprint = Detector::SuggestedResource.calculate_fingerprint(phrase)
33-
end
34-
35-
# This implements the OpenRefine fingerprinting algorithm. See
36-
# https://openrefine.org/docs/technical-reference/clustering-in-depth#fingerprint
37-
#
38-
# @param old_phrase [String] A text string which needs to have its fingerprint calculated. This could either be the
39-
# "phrase" field on the SuggestedResource record, or an incoming search term received from a contributing system.
40-
#
41-
# @return [String] A string of all words in the input, downcased, normalized, and alphabetized.
42-
def self.calculate_fingerprint(old_phrase)
43-
modified_phrase = old_phrase
44-
modified_phrase = modified_phrase.strip
45-
modified_phrase = modified_phrase.downcase
46-
47-
# This removes all punctuation and symbol characters from the string.
48-
modified_phrase = modified_phrase.gsub(/\p{P}|\p{S}/, '')
49-
50-
# Normalize to ASCII (e.g. gödel and godel are liable to be intended to
51-
# find the same thing)
52-
modified_phrase = modified_phrase.to_ascii
53-
54-
# Coercion to ASCII can introduce new symbols, so we remove those now.
55-
modified_phrase = modified_phrase.gsub(/\p{P}|\p{S}/, '')
56-
57-
# Tokenize
58-
tokens = modified_phrase.split
59-
60-
# Remove duplicates and sort
61-
tokens = tokens.uniq
62-
tokens = tokens.sort
63-
64-
# Rejoin tokens
65-
tokens.join(' ')
66-
end
67-
68-
# This replaces all current Detector::SuggestedResource records with a new set from an imported CSV.
69-
#
70-
# @note This method is called by the suggested_resource:reload rake task.
71-
#
72-
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
73-
# at least three headers, named "Title", "URL", and "Phrase". Please note: these values
74-
# are case sensitive.
75-
def self.bulk_replace(input)
76-
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)
77-
78-
# Need to check what columns exist in input
79-
required_headers = %w[Title URL Phrase]
80-
missing_headers = required_headers - input.headers
81-
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?
82-
83-
Detector::SuggestedResource.delete_all
84-
85-
input.each do |line|
86-
record = Detector::SuggestedResource.new({ title: line['Title'], url: line['URL'], phrase: line['Phrase'] })
87-
record.save
88-
end
89-
end
90-
6+
# Detector::SuggestedResource handles detections for SuggestedResource records.
7+
class SuggestedResource
918
# Identify any SuggestedResource record whose pre-calculated fingerprint matches the fingerprint of the incoming
929
# phrase.
9310
#
@@ -98,7 +15,7 @@ def self.bulk_replace(input)
9815
#
9916
# @return [Detector::SuggestedResource] The record whose fingerprint matches that of the search term.
10017
def self.full_term_match(phrase)
101-
SuggestedResource.where(fingerprint: calculate_fingerprint(phrase))
18+
::SuggestedResource.joins(:fingerprints).where(fingerprints: { value: Fingerprint.calculate(phrase) })
10219
end
10320

10421
# Look up any matching Detector::SuggestedResource records, building on the full_term_match method. If a match is

app/models/metrics/algorithms.rb

-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
# suggested_resource_exact :integer
1818
# lcsh :integer
1919
# citation :integer
20-
# barcode :integer
2120
#
2221
module Metrics
2322
# Algorithms aggregates statistics for matches for all SearchEvents

app/models/suggested_resource.rb

+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# frozen_string_literal: true
2+
3+
# == Schema Information
4+
#
5+
# Table name: suggested_resources
6+
#
7+
# id :integer not null, primary key
8+
# title :string
9+
# url :string
10+
# created_at :datetime not null
11+
# updated_at :datetime not null
12+
#
13+
# SuggestedResource stores custom hints that we want to send to the
14+
# user in response to specific strings. For example, a search for "web of
15+
# science" should be met with our custom login link to Web of Science via MIT.
16+
class SuggestedResource < ApplicationRecord
17+
has_many :terms, dependent: :nullify
18+
has_many :fingerprints, through: :terms, dependent: :nullify
19+
20+
# This replaces all current SuggestedResource records with a new set from an imported CSV.
21+
#
22+
# @note This method is called by the suggested_resource:reload rake task.
23+
#
24+
# @param input [CSV::Table] An imported CSV file containing all Suggested Resource records. The CSV file must have
25+
# at least three headers, named "Title", "URL", and "Phrase". Please note: these values
26+
# are case sensitive.
27+
def self.bulk_replace(input)
28+
raise ArgumentError.new, 'Tabular CSV is required' unless input.instance_of?(CSV::Table)
29+
30+
# Need to check what columns exist in input
31+
required_headers = %w[title url phrase]
32+
missing_headers = required_headers - input.headers
33+
raise ArgumentError.new, "Some CSV columns missing: #{missing_headers}" unless missing_headers.empty?
34+
35+
SuggestedResource.destroy_all
36+
37+
input.each do |line|
38+
term = Term.find_or_create_by(phrase: line['phrase'])
39+
40+
# check for existing SuggestedResource with the same title/url
41+
dup_check = SuggestedResource.where(title: line['title'], url: line['url'])
42+
43+
# link to existing SuggestedResource if one exists
44+
term.suggested_resource = if dup_check.count.positive?
45+
dup_check.first
46+
# create a new SuggestedResource if it doesn't exist
47+
else
48+
SuggestedResource.new({ title: line['title'], url: line['url'] })
49+
end
50+
term.save
51+
end
52+
end
53+
end

app/models/term.rb

+20-6
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,24 @@
77
#
88
# Table name: terms
99
#
10-
# id :integer not null, primary key
11-
# phrase :string
12-
# created_at :datetime not null
13-
# updated_at :datetime not null
14-
# flag :boolean
15-
# fingerprint_id :integer
10+
# id :integer not null, primary key
11+
# phrase :string
12+
# created_at :datetime not null
13+
# updated_at :datetime not null
14+
# flag :boolean
15+
# fingerprint_id :integer
16+
# suggested_resource_id :integer
1617
#
1718
class Term < ApplicationRecord
1819
has_many :search_events, dependent: :destroy
1920
has_many :detections, dependent: :destroy
2021
has_many :categorizations, dependent: :destroy
2122
has_many :confirmations, dependent: :destroy
2223
belongs_to :fingerprint, optional: true
24+
belongs_to :suggested_resource, optional: true
2325

2426
before_save :register_fingerprint
27+
before_destroy :check_suggested_resource
2528
after_destroy :check_fingerprint_count
2629

2730
scope :categorized, -> { where.associated(:categorizations).distinct }
@@ -104,6 +107,17 @@ def check_fingerprint_count
104107
fingerprint.destroy if fingerprint&.terms&.count&.zero?
105108
end
106109

110+
# This is called before_destroy to avoid orphaning SuggestedResource records. Deleting terms should be an unlikely
111+
# event, so this should come up rarely. If it does, it warrants the extra care to delete the record manually in the
112+
# Rails console.
113+
def check_suggested_resource
114+
return unless suggested_resource
115+
116+
Rails.logger.error('Cannot delete term with associated suggested resource')
117+
Sentry.capture_message('Cannot delete term with associated suggested resource')
118+
throw :abort
119+
end
120+
107121
# This method looks up all current detections for the given term, and assembles their confidence scores in a format
108122
# usable by the calculate_categorizations method. It exists to transform data like:
109123
# [{3=>0.91}, {1=>0.1}] and [{3=>0.95}]

app/views/layouts/_site_nav.html.erb

-3
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,6 @@
1919
<% if can? :view, :playground %>
2020
<%= link_to('Playground', '/playground', class: 'nav-item') %>
2121
<% end %>
22-
<% if can? :manage, :detector__suggested_resource %>
23-
<%= link_to('Suggested Resources', admin_detector_suggested_resources_path, class: 'nav-item') %>
24-
<% end %>
2522
<% if can? :view, Categorization %>
2623
<%= link_to('Categorizations', admin_categorizations_path, class: 'nav-item') %>
2724
<% end %>

config/environments/development.rb

+4
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,10 @@
7676
# Raise error when a before_action's only/except options reference missing actions.
7777
config.action_controller.raise_on_missing_callback_actions = true
7878

79+
# Local logging overrides
80+
config.logger = Logger.new(STDOUT)
81+
config.log_level = :debug
82+
7983
# Apply autocorrection by RuboCop to files generated by `bin/rails generate`.
8084
# config.generators.apply_rubocop_autocorrect_after_generate!
8185
end

config/routes.rb

-5
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,6 @@
55
end
66

77
namespace :admin do
8-
# Lookup-style detector records
9-
namespace :detector do
10-
resources :suggested_resources
11-
end
12-
138
# Knowledge graph models
149
resources :detectors
1510
resources :detector_categories
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
class CreateSuggestedResources < ActiveRecord::Migration[7.1]
2+
def change
3+
create_table :suggested_resources do |t|
4+
t.string :title
5+
t.string :url
6+
7+
t.timestamps
8+
end
9+
end
10+
end
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
class DropDetectorSuggestedResources < ActiveRecord::Migration[7.1]
2+
def up
3+
drop_table :detector_suggested_resources
4+
end
5+
6+
def down
7+
create_table :detector_suggested_resources do |t|
8+
t.string :title
9+
t.string :url
10+
t.string :phrase
11+
t.string :fingerprint
12+
13+
t.timestamps
14+
end
15+
add_index :detector_suggested_resources, :phrase, unique: true
16+
add_index :detector_suggested_resources, :fingerprint, unique: true
17+
end
18+
end
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
class AddSuggestedResourceToTerms < ActiveRecord::Migration[7.1]
2+
def change
3+
add_reference :terms, :suggested_resource
4+
add_foreign_key :terms, :suggested_resources, on_delete: :nullify
5+
end
6+
end

0 commit comments

Comments
 (0)