Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b8d83e8
perf: optimize event preloading for DOIs using EventsPreloader service
jrhoads Jan 13, 2026
d92a4dd
perf: preload DOI events in show action to optimize serialization
jrhoads Jan 13, 2026
3ef1f1a
test: add unit tests for PreloadedEventRelation and EventsPreloader
jrhoads Jan 13, 2026
d6f28ae
test: enhance N+1 testing and event preloading coverage in Doi model …
jrhoads Jan 13, 2026
e62e4e2
Appease Rubocop
jrhoads Jan 13, 2026
7568389
fix: filter preloaded events by DOI and update event tests
jrhoads Jan 14, 2026
036fd66
Appease Rubocop
jrhoads Jan 14, 2026
7243b85
refactor: delegate first, last, and count methods to events array
jrhoads Jan 14, 2026
10b163e
feat: add provider_id filtering to ReferenceRepository and update tests
jrhoads Jan 14, 2026
f509b75
refactor: update PreloadedEventRelation#inject to accept variable arg…
jrhoads Feb 26, 2026
5e253ae
test: refactor and expand test structure for EventsPreloader specs
jrhoads Feb 26, 2026
55b93dc
test: remove redundant code blocks and empty context in EventsPreload…
jrhoads Feb 26, 2026
9c6e1f6
refactor: move and simplify EventsPreloader logic in show action
jrhoads Mar 13, 2026
7cf9f31
fix: handle blank DOIs, order events by date, and prevent duplicate a…
jrhoads Mar 13, 2026
3b67b1e
refactor: extract filtered_preloaded_events helper to dry up event me…
jrhoads Mar 13, 2026
45eb43e
Appease rubocop
jrhoads Mar 13, 2026
8ceaecf
fix: ensure unique events in EventsPreloader after concatenation
jrhoads Mar 13, 2026
ab3dee0
test: update DOI preloading assertions to verify against uploaded bat…
jrhoads Mar 13, 2026
cf567e8
test: update PreloadedEventRelation#sort_by spec to use descending order
jrhoads Mar 13, 2026
a54ac9d
test: use match_array for reference and citation id comparisons
jrhoads Mar 16, 2026
6e5bdee
test: update DOI preloading tests to use EventsPreloader and verify q…
jrhoads Mar 16, 2026
ac5fc9c
test: include provider in preloaded relations and fix typo in spec
jrhoads Mar 16, 2026
671d5e7
perf: eager load providers for DOIs to prevent N+1 queries
jrhoads Mar 16, 2026
5d230c6
Appease Rubocop
jrhoads Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions app/controllers/datacite_dois_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,9 @@ def show
publisher: params[:publisher],
include_other_registration_agencies: params[:include_other_registration_agencies],
}
# Preload events since we are going to show details
# This optimizes the serializer which accesses part_events, citation_events, etc.
EventsPreloader.new([doi]).preload!

render(
json: DataciteDoiSerializer.new(doi, options).serializable_hash.to_json,
Expand Down
97 changes: 97 additions & 0 deletions app/models/concerns/preloaded_event_relation.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# frozen_string_literal: true

# Wrapper class to make preloaded event arrays compatible with ActiveRecord::Relation API
# This allows existing code that calls methods like `pluck`, `map`, `select` to work
# with in-memory arrays without modification.
class PreloadedEventRelation
include Enumerable

def initialize(events)
@events = Array(events)
end

# Delegate Enumerable methods to the underlying array
def each(&block)
@events.each(&block)
end

# Delegate common Enumerable methods explicitly
def first
@events.first
end

def last
@events.last
end

def count
@events.count
end
Comment on lines +19 to +29
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find potential call sites that pass args/blocks to these methods.
# Expected: Any matches indicate current zero-arity methods are risky.
rg -nP '\b(view_events|download_events|reference_events|citation_events|part_events|part_of_events|version_events|version_of_events)\.(first|last)\s*\(\s*[^)]'
rg -nP '\b(view_events|download_events|reference_events|citation_events|part_events|part_of_events|version_events|version_of_events)\.count\s*\{'

Repository: datacite/lupo

Length of output: 39


🏁 Script executed:

# Read the full implementation file
cat -n app/models/concerns/preloaded_event_relation.rb

Repository: datacite/lupo

Length of output: 2619


🏁 Script executed:

# Check if `@events` is an Array or something else
rg -A 5 -B 5 '@events\s*=' app/models/concerns/preloaded_event_relation.rb

Repository: datacite/lupo

Length of output: 313


🏁 Script executed:

# Look for all method definitions to understand the pattern
rg 'def ' app/models/concerns/preloaded_event_relation.rb

Repository: datacite/lupo

Length of output: 377


🏁 Script executed:

# Check if this concern is included anywhere
rg 'PreloadedEventRelation|preloaded_event_relation' --type ruby

Repository: datacite/lupo

Length of output: 1875


🏁 Script executed:

# Check how PreloadedEventRelation is used in doi.rb
rg -A 10 -B 10 'PreloadedEventRelation' app/models/doi.rb

Repository: datacite/lupo

Length of output: 802


🏁 Script executed:

# Check the test file to see if args are tested
cat -n spec/models/preloaded_event_relation_spec.rb

Repository: datacite/lupo

Length of output: 5202


🏁 Script executed:

# Check if Array#first, Array#last, Array#count can accept args in Ruby
# Verify their actual signatures in the standard library
python3 << 'EOF'
# Check Ruby's Array method signatures
import subprocess
result = subprocess.run(['ruby', '-e', '''
puts "Array#first signature:"
puts Array.method(:first).arity rescue puts "N/A"

arr = [1, 2, 3, 4, 5]
puts "arr.first(2): #{arr.first(2).inspect}"
puts "arr.last(2): #{arr.last(2).inspect}"
puts "arr.count: #{arr.count}"
puts "arr.count { |x| x > 2 }: #{arr.count { |x| x > 2 }}"
'''], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
EOF

Repository: datacite/lupo

Length of output: 165


Match first/last/count arity with Enumerable and Array behavior.

The current zero-arity signatures reject valid arguments and blocks that Array and Enumerable support. Ruby's Array methods accept optional parameters: first(n), last(n), and count { |item| ... }. As a drop-in replacement for ActiveRecord::Relation, this wrapper must support the same signatures to maintain compatibility.

🐛 Fix
-  def first
-    `@events.first`
+  def first(*args)
+    `@events.first`(*args)
   end

-  def last
-    `@events.last`
+  def last(*args)
+    `@events.last`(*args)
   end

-  def count
-    `@events.count`
+  def count(*args, &block)
+    `@events.count`(*args, &block)
   end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def first
@events.first
end
def last
@events.last
end
def count
@events.count
end
def first(*args)
`@events.first`(*args)
end
def last(*args)
`@events.last`(*args)
end
def count(*args, &block)
`@events.count`(*args, &block)
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/concerns/preloaded_event_relation.rb` around lines 19 - 29, The
wrapper methods first, last, and count currently accept no arguments or blocks;
update them to mirror Array/Enumerable signatures by forwarding any arguments
and blocks to `@events`. Replace def first; `@events.first`; end with def
first(*args, &blk); `@events.first`(*args, &blk); end, do the same for last (def
last(*args, &blk); `@events.last`(*args, &blk); end), and implement count to
accept an optional item and/or block (def count(item = nil, &blk);
`@events.count`(item, &blk); end) so behavior matches Array/Enumerable and
ActiveRecord::Relation.


# Implement pluck to match ActiveRecord::Relation#pluck behavior
# Supports single or multiple column names
def pluck(*column_names)
if column_names.length == 1
column_name = column_names.first
@events.map { |event| event.public_send(column_name) }
else
@events.map { |event| column_names.map { |col| event.public_send(col) } }
end
end

# Delegate map to the underlying array
def map(&block)
@events.map(&block)
end

# Delegate select to the underlying array
def select(&block)
PreloadedEventRelation.new(@events.select(&block))
end

# Delegate other common Enumerable methods
def compact
PreloadedEventRelation.new(@events.compact)
end

def uniq
PreloadedEventRelation.new(@events.uniq)
end

def sort_by(&block)
PreloadedEventRelation.new(@events.sort_by(&block))
end

def group_by(&block)
@events.group_by(&block)
end

def inject(*args, &block)
@events.inject(*args, &block)
end

def length
@events.length
end

def empty?
@events.empty?
end

def present?
@events.present?
end

def blank?
@events.blank?
end

# Allow direct access to the underlying array
def to_a
@events
end

def to_ary
@events
end
end
13 changes: 4 additions & 9 deletions app/models/datacite_doi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -144,19 +144,14 @@ def self.import_in_bulk(ids, options = {})

# get database records from array of database ids
selected_dois = DataciteDoi.where(id: ids, type: "DataciteDoi").includes(
:client,
{ client: :provider },
:media,
:view_events,
:download_events,
:citation_events,
:reference_events,
:part_events,
:part_of_events,
:version_events,
:version_of_events,
:metadata
)
selected_dois.find_in_batches(batch_size: batch_size) do |dois|
# Preload all events for this batch in a single query
EventsPreloader.new(dois).preload!

bulk_body = dois.map do |doi|
{
index: {
Expand Down
78 changes: 78 additions & 0 deletions app/models/doi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

require "maremma"
require "benchmark"
require_relative "concerns/preloaded_event_relation"

class Doi < ApplicationRecord
self.ignored_columns += [:publisher]
Expand Down Expand Up @@ -76,6 +77,7 @@ class Doi < ApplicationRecord
alias_attribute :state, :aasm_state

attr_accessor :current_user
attr_accessor :preloaded_events

attribute :regenerate, :boolean, default: false
attribute :only_validate, :boolean, default: false
Expand All @@ -100,6 +102,73 @@ class Doi < ApplicationRecord
# has_many :source_events, class_name: "Event", primary_key: :doi, foreign_key: :source_doi, dependent: :destroy
# has_many :target_events, class_name: "Event", primary_key: :doi, foreign_key: :target_doi, dependent: :destroy

# Override association methods to use preloaded_events when available
# Check for !nil instead of present? to handle empty arrays (preloaded but no events)
# Also filter by source_doi/target_doi to match association behavior
def view_events
if !preloaded_events.nil?
filtered_preloaded_events(:target_relation_type_id, "views", :target_doi)
else
association(:view_events).scope
end
end

def download_events
if !preloaded_events.nil?
filtered_preloaded_events(:target_relation_type_id, "downloads", :target_doi)
else
association(:download_events).scope
end
end

def reference_events
if !preloaded_events.nil?
filtered_preloaded_events(:source_relation_type_id, "references", :source_doi)
else
association(:reference_events).scope
end
end

def citation_events
if !preloaded_events.nil?
filtered_preloaded_events(:target_relation_type_id, "citations", :target_doi)
else
association(:citation_events).scope
end
end

def part_events
if !preloaded_events.nil?
filtered_preloaded_events(:source_relation_type_id, "parts", :source_doi)
else
association(:part_events).scope
end
end

def part_of_events
if !preloaded_events.nil?
filtered_preloaded_events(:target_relation_type_id, "part_of", :target_doi)
else
association(:part_of_events).scope
end
end

def version_events
if !preloaded_events.nil?
filtered_preloaded_events(:source_relation_type_id, "versions", :source_doi)
else
association(:version_events).scope
end
end

def version_of_events
if !preloaded_events.nil?
filtered_preloaded_events(:target_relation_type_id, "version_of", :target_doi)
else
association(:version_of_events).scope
end
end

has_many :references, class_name: "Doi", through: :reference_events, source: :doi_for_target
has_many :citations, class_name: "Doi", through: :citation_events, source: :doi_for_source
has_many :parts, class_name: "Doi", through: :part_events, source: :doi_for_target
Expand Down Expand Up @@ -2781,6 +2850,15 @@ def handle_resource_type(types)
end

private
def filtered_preloaded_events(relation_type_key, relation_type_value, doi_key)
PreloadedEventRelation.new(
preloaded_events.select do |e|
e.public_send(relation_type_key) == relation_type_value &&
e.public_send(doi_key)&.upcase == doi.upcase
end
)
end

def update_publisher_from_hash
symbolized_publisher_hash = publisher_before_type_cast.symbolize_keys
if !symbolized_publisher_hash.values.all?(nil)
Expand Down
13 changes: 4 additions & 9 deletions app/models/other_doi.rb
Original file line number Diff line number Diff line change
Expand Up @@ -148,19 +148,14 @@ def self.import_in_bulk(ids, options = {})

# get database records from array of database ids
selected_dois = OtherDoi.where(id: ids, type: "OtherDoi").includes(
:client,
{ client: :provider },
:media,
:view_events,
:download_events,
:citation_events,
:reference_events,
:part_events,
:part_of_events,
:version_events,
:version_of_events,
:metadata
)
selected_dois.find_in_batches(batch_size: batch_size) do |dois|
# Preload all events for this batch in a single query
EventsPreloader.new(dois).preload!

bulk_body = dois.map do |doi|
{
index: {
Expand Down
5 changes: 5 additions & 0 deletions app/models/reference_repository.rb
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,11 @@ def filter(options)
"repository_type": options[:repository_type].split(",")
} }
end
if options[:provider_id].present?
retval << { term: {
"provider_id": { value: options[:provider_id], case_insensitive: true }
} }
end
if options[:is_open] == "true"
retval << { term: {
"data_access.type": "open"
Expand Down
62 changes: 62 additions & 0 deletions app/services/events_preloader.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# frozen_string_literal: true

# Service class to preload events for a batch of DOIs in a single query
# This dramatically reduces database queries from N*M (DOIs * Relationship Types) to 1-2 queries total
class EventsPreloader
# Maximum number of DOIs to query at once to avoid database parameter limits
CHUNK_SIZE = 1000

def initialize(dois)
@dois = Array(dois)
@doi_map = {}
@dois.each do |doi|
next if doi.doi.blank?
@doi_map[doi.doi.upcase] = doi
# Initialize preloaded_events array if not already set
doi.preloaded_events ||= []
end
end

# Preload all events for the batch of DOIs
def preload!
return if @dois.empty?

doi_identifiers = @dois.filter_map { |doi| doi.doi&.upcase }.uniq
return if doi_identifiers.empty?

Comment on lines +22 to +26
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep preload! return behavior consistent on early exits.

Line 22 and Line 25 return nil, but the normal path returns self. This also skips the final array initialization path for all-blank DOI inputs.

🛠️ Proposed fix
   def preload!
-    return if `@dois.empty`?
+    return self if `@dois.empty`?

     doi_identifiers = `@dois.filter_map` { |doi| doi.doi&.upcase }.uniq
-    return if doi_identifiers.empty?
+    if doi_identifiers.empty?
+      `@dois.each` { |doi| doi.preloaded_events ||= [] }
+      return self
+    end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return if @dois.empty?
doi_identifiers = @dois.filter_map { |doi| doi.doi&.upcase }.uniq
return if doi_identifiers.empty?
return self if `@dois.empty`?
doi_identifiers = `@dois.filter_map` { |doi| doi.doi&.upcase }.uniq
if doi_identifiers.empty?
`@dois.each` { |doi| doi.preloaded_events ||= [] }
return self
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/events_preloader.rb` around lines 22 - 26, In preload! change
the early exits so they return self instead of nil and preserve the final array
initialization: replace the two bare returns after checking `@dois.empty`? and
doi_identifiers.empty? with returns of self, and ensure the method still
initializes `@doi_events` (e.g. via `@doi_events` ||= []) on the early-exit path so
the final array initialization behavior matches the normal path; update
references inside preload! (the `@dois` check, doi_identifiers variable and final
`@doi_events` initialization) accordingly.

# Fetch events in chunks to avoid database parameter limits
all_events = []
doi_identifiers.each_slice(CHUNK_SIZE) do |chunk|
events = Event.where(
"source_doi IN (?) OR target_doi IN (?)",
chunk, chunk
).order(updated_at: :desc).to_a
all_events.concat(events)
end
all_events.uniq!

# Group events by DOI and assign to each Doi object
all_events.each do |event|
# Add event to source DOI's preloaded_events if it matches
if event.source_doi.present?
source_doi_obj = @doi_map[event.source_doi.upcase]
source_doi_obj.preloaded_events << event if source_doi_obj
end

# Add event to target DOI's preloaded_events if it matches
if event.target_doi.present?
target_doi_obj = @doi_map[event.target_doi.upcase]
if target_doi_obj && target_doi_obj != source_doi_obj
target_doi_obj.preloaded_events << event
end
end
end

# Ensure all DOIs have an array (even if empty)
@dois.each do |doi|
doi.preloaded_events ||= []
end

self
end
end
Loading
Loading