Optimize DOI event performance with bulk-preloading#1449
Optimize DOI event performance with bulk-preloading#1449
Conversation
17e9f41 to
46a7e33
Compare
83f6c29 to
99492b7
Compare
99492b7 to
13f8d15
Compare
13f8d15 to
e998678
Compare
📝 WalkthroughWalkthroughAdds an EventsPreloader service that batch-populates DOIs with in-memory event arrays, a PreloadedEventRelation wrapper to mimic ActiveRecord relations, integrates preloading into batch imports and controller show, adds a provider_id filter, and updates tests for preloading and relation behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@app/controllers/datacite_dois_controller.rb`:
- Around line 452-454: The guard `params[:detail] != false` is always truthy
because params are strings, so remove the misleading conditional and always
preload: call EventsPreloader.new([doi]).preload! unconditionally (keep the
existing comment about preloading) so the intent matches behavior;
alternatively, if you want to preserve conditional behavior, change the check to
string comparison (e.g., params[:detail] != "false" or
ActiveModel::Type::Boolean.new.cast(params[:detail]) ) but the simpler fix is to
drop the `if` and always invoke EventsPreloader.new([doi]).preload!.
In `@app/services/events_preloader.rb`:
- Around line 37-49: When distributing events in the all_events loop, avoid
appending the same event twice when event.source_doi == event.target_doi by
checking for equality before pushing to the same DOI's preloaded_events: look up
source_doi_obj and target_doi_obj from `@doi_map` (used in the current loop), and
if they refer to the same object, append the event only once (otherwise append
to each if present); adjust the logic around source_doi_obj.preloaded_events <<
event and target_doi_obj.preloaded_events << event to perform this dedupe during
distribution (or add a single guard to skip the second append when source and
target DOIs match).
🧹 Nitpick comments (7)
app/models/concerns/preloaded_event_relation.rb (2)
69-71:injectoverride doesn't support the symbol form.Ruby's
Enumerable#injectsupports bothinject(:+)(symbol) andinject(0) { |sum, x| sum + x }(block) forms. This override only passesinitialand&blocktoArray#inject, so callingrelation.inject(:+)would pass:+asinitialwith anilblock, raising aTypeError.Currently,
injectis only called on the result ofpluck(which returns a plainArray), so this isn't actively broken. But it's a latent trap if the API surface grows.♻️ Suggested fix
- def inject(initial = nil, &block) - `@events.inject`(initial, &block) + def inject(*args, &block) + `@events.inject`(*args, &block) end
94-96: Removeto_ary— it contradicts the design and is never used.The
to_arymethod is never called explicitly in the codebase and no tests cover it. Since wrapper-preserving methods likeselect,compact, anduniqreturn newPreloadedEventRelationinstances, havingto_ary(which enables implicit unwrapping via array concatenation, splat, orArray()calls) contradicts the design intent. Keep onlyto_afor explicit conversion.app/services/events_preloader.rb (1)
51-54: Redundant initialization — already handled ininitialize.Line 15 in
initializealready setsdoi.preloaded_events ||= []for every DOI. This block on lines 52-54 is a no-op unless a DOI'spreloaded_eventswas somehow set tonilafter initialization, which shouldn't happen in normal flow.spec/models/preloaded_event_relation_spec.rb (2)
65-71:sort_bytest doesn't verify actual reordering.The events are already in ascending
totalorder ([10, 20, 30]), sosort_by(&:total)produces the same order as the input. This test passes trivially without proving that sorting actually works. Consider sorting in reverse or using a differently-ordered input.🔧 Suggested improvement
describe "#sort_by" do - it "sorts events" do - result = relation.sort_by(&:total) + it "sorts events by attribute" do + reversed = PreloadedEventRelation.new([event3, event1, event2]) + result = reversed.sort_by(&:total) expect(result).to be_a(PreloadedEventRelation) expect(result.to_a.map(&:total)).to eq([10, 20, 30]) end end
133-145: Consider testingfirst/last/counton an empty relation too.The Enumerable methods block only tests the happy path. Since
PreloadedEventRelationwraps an array and delegates these, it would be good to verify edge behavior (e.g.,firston empty returnsnil,countreturns0).spec/services/events_preloader_spec.rb (1)
89-107: Large batch test creates 1100+ DB records — significant test suite slowdown.
create_list(:doi, EventsPreloader::CHUNK_SIZE + 100, ...)withCHUNK_SIZE = 1000inserts 1100 DOI records plus associated data. This will be very slow (potentially minutes) and could dominate the test suite runtime.Consider using
build_stubbed_listor mocking the chunking behavior instead of hitting the database, since the test only verifies that no error is raised andpreloaded_eventsis not nil.♻️ Suggested alternative approach
context "with large batch" do it "chunks large DOI lists" do - # Create more than CHUNK_SIZE DOIs - large_batch = create_list(:doi, EventsPreloader::CHUNK_SIZE + 100, client: client, aasm_state: "findable") - - # Create events for some of them - create(:event_for_crossref, { - subj_id: "https://doi.org/#{large_batch.first.doi}", - obj_id: "https://doi.org/#{large_batch.last.doi}", - relation_type_id: "references", - }) - - expect { - EventsPreloader.new(large_batch).preload! - }.not_to raise_error - - expect(large_batch.first.preloaded_events).not_to be_nil + # Verify chunking by checking that Event.where is called multiple times + large_batch = build_stubbed_list(:doi, EventsPreloader::CHUNK_SIZE + 100, client: client) + large_batch.each { |d| allow(d).to receive(:preloaded_events=) } + large_batch.each { |d| allow(d).to receive(:preloaded_events).and_return([]) } + + relation = double("relation", to_a: []) + expect(Event).to receive(:where).twice.and_return(relation) + + EventsPreloader.new(large_batch).preload! end endspec/models/doi_related_spec.rb (1)
42-65: Test doesn't actually verify preloading — it verifies events via a fresh DB query.
fresh_doi1 = DataciteDoi.find(doi1.id)creates a new instance withpreloaded_events = nil, soreference_eventsandcitation_eventsfall back to database queries. This test confirms events exist in the DB but does not validate thatEventsPreloaderwas used duringimport_in_bulk.To verify preloading actually occurred, consider spying on
EventsPreloaderor asserting on the in-batch DOI objects:♻️ Suggested approach
- allow(DataciteDoi).to receive(:upload_to_elasticsearch) - - DataciteDoi.import_in_bulk(ids) - - # Verify that events were preloaded (check via a fresh query) - fresh_doi1 = DataciteDoi.find(doi1.id) - expect(fresh_doi1.reference_events.count).to eq(1) - expect(fresh_doi1.citation_events.count).to eq(1) + preloader_spy = instance_double(EventsPreloader, preload!: nil) + expect(EventsPreloader).to receive(:new).and_return(preloader_spy) + expect(preloader_spy).to receive(:preload!).once + + allow(DataciteDoi).to receive(:upload_to_elasticsearch) + + DataciteDoi.import_in_bulk(ids)
e998678 to
7c3032f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
app/services/events_preloader.rb (1)
37-49: Duplicate event appended whensource_doi == target_doi.This issue was already flagged in a prior review. The fix proposed there (check
target_doi_obj != source_doi_objbefore the second append) remains unaddressed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/services/events_preloader.rb` around lines 37 - 49, When iterating all_events, avoid appending the same event twice when event.source_doi == event.target_doi by checking that the resolved target_doi_obj is different from source_doi_obj before pushing to preloaded_events; specifically, after resolving source_doi_obj via `@doi_map` and appending to source_doi_obj.preloaded_events, resolve target_doi_obj from `@doi_map` for event.target_doi and only append to target_doi_obj.preloaded_events if target_doi_obj exists AND target_doi_obj != source_doi_obj (using the existing source_doi_obj and target_doi_obj local variables).
🧹 Nitpick comments (2)
app/services/events_preloader.rb (1)
51-54: Redundant second||= []loop — can be removed.
initializealready callsdoi.preloaded_events ||= []for every DOI. Afterpreload!appends events to those already-initialized arrays, no DOI'spreloaded_eventscan benil, making this second loop dead code.♻️ Proposed cleanup
- # Ensure all DOIs have an array (even if empty) - `@dois.each` do |doi| - doi.preloaded_events ||= [] - end - self end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/services/events_preloader.rb` around lines 51 - 54, Remove the redundant loop that sets doi.preloaded_events ||= [] inside preload! (the `@dois.each` block shown), since initialize already ensures every DOI's preloaded_events is initialized; locate the `@dois.each` |doi| ... doi.preloaded_events ||= [] in EventsPreloader (or the preload! method) and delete that loop so preload! only appends events to existing arrays.spec/models/preloaded_event_relation_spec.rb (1)
81-86: Consider adding a no-argumentinjecttest to expose the latent edge case.The implementation defaults
initialtoniland always passes it explicitly:@events.inject(nil, &block). When called without an initial value (relation.inject { |acc, e| ... }), this passesnilas the accumulator instead of using the first element as Array's nativeinjectwould. A test covering the no-argument form would catch this deviation.🧪 Suggested additional test
+ it "reduces events without an explicit initial value" do + result = relation.inject { |acc, e| acc.total > e.total ? acc : e } + expect(result).to eq(event3) # event3 has the highest total (30) + end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/preloaded_event_relation_spec.rb` around lines 81 - 86, Add a spec that calls relation.inject without an explicit initial to catch the edge case where the implementation currently passes nil as the initial accumulator; e.g. add a test that invokes relation.inject { |sum, e| sum + e.total } (referencing the relation.inject behavior in PreloadedEventRelation) and assert the expected aggregated result (60) so the failing behavior is exposed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/services/events_preloader.rb`:
- Around line 28-34: The Event query inside the doi_identifiers.each_slice block
is unordered (Event.where(...)) so consumers of PreloadedEventRelation#first or
`#last` get nondeterministic results; update the query used in this block (the
Event.where(...) call that feeds all_events.concat) to include a stable order
(for example .order(:occurred_at) or .order(:id)) so the in-memory all_events
array has a predictable, repeatable ordering across runs.
- Around line 12-16: Guard against nil DOI identifiers before calling .upcase by
checking or normalizing the value first: in the loop that populates `@doi_map`
(the block iterating `@dois` and setting `@doi_map`[doi.doi.upcase] = doi) and in
the preload! method where doi.doi is used again, replace the direct .upcase call
with a nil-safe approach (e.g., skip entries where doi.doi is nil/blank or call
to_s.upcase) so you never call .upcase on nil; also keep the existing
initialization of doi.preloaded_events ||= [] unchanged. Ensure you update both
the `@dois.each` block and the preload! code path that references doi.doi to use
the same nil-safe logic.
In `@spec/models/doi_related_spec.rb`:
- Around line 42-65: The test currently only checks DB state; instead verify
that EventsPreloader actually preloaded events during DataciteDoi.import_in_bulk
by spying on EventsPreloader#preload! or capturing yielded DOI instances from
the batch callback: e.g. set an expectation that EventsPreloader receives
:preload! when DataciteDoi.import_in_bulk(ids) runs (or wrap import_in_bulk with
a block to capture the batch DOIs and assert those instances have non-nil
preloaded_events and their reference_events/citation_events are available
without extra queries), referencing DataciteDoi.import_in_bulk,
EventsPreloader#preload!, fresh_doi1/doi1, and the
reference_events/citation_events accessors to locate the relevant code.
---
Duplicate comments:
In `@app/services/events_preloader.rb`:
- Around line 37-49: When iterating all_events, avoid appending the same event
twice when event.source_doi == event.target_doi by checking that the resolved
target_doi_obj is different from source_doi_obj before pushing to
preloaded_events; specifically, after resolving source_doi_obj via `@doi_map` and
appending to source_doi_obj.preloaded_events, resolve target_doi_obj from
`@doi_map` for event.target_doi and only append to target_doi_obj.preloaded_events
if target_doi_obj exists AND target_doi_obj != source_doi_obj (using the
existing source_doi_obj and target_doi_obj local variables).
---
Nitpick comments:
In `@app/services/events_preloader.rb`:
- Around line 51-54: Remove the redundant loop that sets doi.preloaded_events
||= [] inside preload! (the `@dois.each` block shown), since initialize already
ensures every DOI's preloaded_events is initialized; locate the `@dois.each` |doi|
... doi.preloaded_events ||= [] in EventsPreloader (or the preload! method) and
delete that loop so preload! only appends events to existing arrays.
In `@spec/models/preloaded_event_relation_spec.rb`:
- Around line 81-86: Add a spec that calls relation.inject without an explicit
initial to catch the edge case where the implementation currently passes nil as
the initial accumulator; e.g. add a test that invokes relation.inject { |sum, e|
sum + e.total } (referencing the relation.inject behavior in
PreloadedEventRelation) and assert the expected aggregated result (60) so the
failing behavior is exposed.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (11)
app/controllers/datacite_dois_controller.rbapp/models/concerns/preloaded_event_relation.rbapp/models/datacite_doi.rbapp/models/doi.rbapp/models/other_doi.rbapp/models/reference_repository.rbapp/services/events_preloader.rbspec/models/doi_related_spec.rbspec/models/preloaded_event_relation_spec.rbspec/requests/repositories_spec.rbspec/services/events_preloader_spec.rb
🚧 Files skipped from review as they are similar to previous changes (6)
- app/controllers/datacite_dois_controller.rb
- app/models/datacite_doi.rb
- spec/services/events_preloader_spec.rb
- app/models/reference_repository.rb
- app/models/doi.rb
- app/models/concerns/preloaded_event_relation.rb
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@spec/requests/reference_repositories_spec.rb`:
- Around line 56-61: The tear-down is too broad: instead of calling
Client.destroy_all and Provider.destroy_all, only delete the specific records
created in this spec (the records set up in before :all, e.g. the `@client` and
any `@provider` instances) — track those instances or their ids in before :all and
call destroy on them in after :all (or use an array of created records) so you
don't wipe all Client/Provider rows; keep ReferenceRepository.destroy_all and
Prefix.destroy_all as-is since they're local to this spec.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
spec/requests/reference_repositories_spec.rb
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
spec/services/events_preloader_spec.rb (1)
104-122: Consider reducing test data size or using stubbing for the chunking test.Creating 1,100 DOIs (
CHUNK_SIZE + 100) in this test is expensive and will slow down the test suite. The test only verifies that chunking doesn't raise an error and that preloaded_events is set.Consider either:
- Stubbing
CHUNK_SIZEto a smaller value (e.g., 10) for this test.- Reducing the batch size and verifying the chunking logic through a unit test approach.
♻️ Suggested approach using stub
context "with large batch" do it "chunks large DOI lists" do - # Create more than CHUNK_SIZE DOIs - large_batch = create_list(:doi, EventsPreloader::CHUNK_SIZE + 100, client: client, aasm_state: "findable") + # Stub CHUNK_SIZE to a smaller value to avoid creating thousands of records + stub_const("EventsPreloader::CHUNK_SIZE", 10) + large_batch = create_list(:doi, 15, client: client, aasm_state: "findable") # Create events for some of them create(:event_for_crossref, {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/services/events_preloader_spec.rb` around lines 104 - 122, The test creates a very large fixture list (EventsPreloader::CHUNK_SIZE + 100) which slows the suite; change the spec to avoid heavy data by stubbing EventsPreloader::CHUNK_SIZE to a small value (e.g., allow(EventsPreloader).to receive(:const_get).with("CHUNK_SIZE").and_return(10)) or otherwise temporarily set CHUNK_SIZE within the example, then create a much smaller list (e.g., 15) and keep calling EventsPreloader.new(large_batch).preload! and asserting large_batch.first.preloaded_events; ensure the stub is scoped to this example so other tests are unaffected and reference EventsPreloader, CHUNK_SIZE, preload! and the spec example that currently creates the large_batch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@spec/services/events_preloader_spec.rb`:
- Around line 11-41: Remove the duplicated and unclosed spec blocks to fix the
syntax error: keep a single describe "#initialize" and a single describe
"#preload!" block, remove the repeated describe "#initialize" clone and the
duplicate describe "#preload!", ensure the original describe "#initialize" has
its matching end, and either implement or remove the empty context "with source
and target events" (context "with source and target events") so every
describe/context has a closing end; verify the spec now contains one
EventsPreloader.new(dois) test and the preload! contexts are properly closed.
---
Nitpick comments:
In `@spec/services/events_preloader_spec.rb`:
- Around line 104-122: The test creates a very large fixture list
(EventsPreloader::CHUNK_SIZE + 100) which slows the suite; change the spec to
avoid heavy data by stubbing EventsPreloader::CHUNK_SIZE to a small value (e.g.,
allow(EventsPreloader).to receive(:const_get).with("CHUNK_SIZE").and_return(10))
or otherwise temporarily set CHUNK_SIZE within the example, then create a much
smaller list (e.g., 15) and keep calling
EventsPreloader.new(large_batch).preload! and asserting
large_batch.first.preloaded_events; ensure the stub is scoped to this example so
other tests are unaffected and reference EventsPreloader, CHUNK_SIZE, preload!
and the spec example that currently creates the large_batch.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
app/models/concerns/preloaded_event_relation.rbspec/services/events_preloader_spec.rb
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/concerns/preloaded_event_relation.rb
7f17493 to
6ae2a83
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
app/models/concerns/preloaded_event_relation.rb (1)
89-96: Consider returning a defensive copy fromto_aandto_ary.Both methods return the internal
@eventsarray directly, which allows callers to mutate the wrapper's internal state. While this matches AR behavior and is likely fine in this context, it's worth noting.# If defensive copies are ever needed: def to_a `@events.dup` end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/concerns/preloaded_event_relation.rb` around lines 89 - 96, to_a and to_ary currently expose the internal `@events` array directly which allows external callers to mutate internal state; change both methods (to_a and to_ary in PreloadedEventRelation) to return a defensive copy of `@events` (e.g. use dup) so callers get a copy rather than the original, ensuring internal state isn't modified by external code while preserving current behavior for read-only use; if nested mutability is a concern consider using map(&:dup) or deep duplication where appropriate.app/models/doi.rb (1)
106-184: Optional: Consider extracting duplicate filter logic into a helper method.All eight methods follow the same pattern. A private helper could reduce duplication:
♻️ Optional refactor to reduce duplication
private def filtered_preloaded_events(relation_type_key, relation_type_value, doi_key) PreloadedEventRelation.new( preloaded_events.select do |e| e.public_send(relation_type_key) == relation_type_value && e.public_send(doi_key)&.upcase == doi.upcase end ) end # Then in each method: def view_events if !preloaded_events.nil? filtered_preloaded_events(:target_relation_type_id, "views", :target_doi) else association(:view_events).scope end end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/doi.rb` around lines 106 - 184, All eight event accessor methods (view_events, download_events, reference_events, citation_events, part_events, part_of_events, version_events, version_of_events) duplicate the same preloaded_events filtering logic; extract a private helper (eg. filtered_preloaded_events(relation_type_key, relation_type_value, doi_key)) that builds and returns the PreloadedEventRelation for the given keys and value using preloaded_events.select, then replace each method to call that helper when preloaded_events is present and fall back to association(...).scope otherwise, keeping the existing symbols PreloadedEventRelation, preloaded_events, doi and association(...) to locate and wire up the change.spec/models/preloaded_event_relation_spec.rb (1)
127-131: Consider adding a test forto_arymethod.The class implements both
to_aandto_ary, but onlyto_ais tested.to_aryenables implicit array coercion (e.g., in splat operations).🧪 Optional test addition
describe "#to_ary" do it "enables implicit array coercion" do expect(relation.to_ary).to eq(events) # Test splat behavior first, *rest = relation expect(first).to eq(event1) expect(rest).to eq([event2, event3]) end end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/preloaded_event_relation_spec.rb` around lines 127 - 131, Add a spec for the to_ary behavior: create a new describe block for "#to_ary" that asserts relation.to_ary equals the underlying events array and verifies implicit array coercion (splat/destructuring) by doing something like "first, *rest = relation" and asserting first equals event1 and rest equals [event2, event3]; reference the existing relation and event fixtures used in the "#to_a" spec so the new test mirrors that setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/models/concerns/preloaded_event_relation.rb`:
- Around line 89-96: to_a and to_ary currently expose the internal `@events` array
directly which allows external callers to mutate internal state; change both
methods (to_a and to_ary in PreloadedEventRelation) to return a defensive copy
of `@events` (e.g. use dup) so callers get a copy rather than the original,
ensuring internal state isn't modified by external code while preserving current
behavior for read-only use; if nested mutability is a concern consider using
map(&:dup) or deep duplication where appropriate.
In `@app/models/doi.rb`:
- Around line 106-184: All eight event accessor methods (view_events,
download_events, reference_events, citation_events, part_events, part_of_events,
version_events, version_of_events) duplicate the same preloaded_events filtering
logic; extract a private helper (eg.
filtered_preloaded_events(relation_type_key, relation_type_value, doi_key)) that
builds and returns the PreloadedEventRelation for the given keys and value using
preloaded_events.select, then replace each method to call that helper when
preloaded_events is present and fall back to association(...).scope otherwise,
keeping the existing symbols PreloadedEventRelation, preloaded_events, doi and
association(...) to locate and wire up the change.
In `@spec/models/preloaded_event_relation_spec.rb`:
- Around line 127-131: Add a spec for the to_ary behavior: create a new describe
block for "#to_ary" that asserts relation.to_ary equals the underlying events
array and verifies implicit array coercion (splat/destructuring) by doing
something like "first, *rest = relation" and asserting first equals event1 and
rest equals [event2, event3]; reference the existing relation and event fixtures
used in the "#to_a" spec so the new test mirrors that setup.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (11)
app/controllers/datacite_dois_controller.rbapp/models/concerns/preloaded_event_relation.rbapp/models/datacite_doi.rbapp/models/doi.rbapp/models/other_doi.rbapp/models/reference_repository.rbapp/services/events_preloader.rbspec/models/doi_related_spec.rbspec/models/preloaded_event_relation_spec.rbspec/requests/repositories_spec.rbspec/services/events_preloader_spec.rb
🚧 Files skipped from review as they are similar to previous changes (6)
- app/models/other_doi.rb
- app/models/reference_repository.rb
- app/services/events_preloader.rb
- spec/services/events_preloader_spec.rb
- app/models/datacite_doi.rb
- spec/requests/repositories_spec.rb
6ae2a83 to
55b93dc
Compare
|
@coderabbitai please review again |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
spec/models/doi_related_spec.rb (1)
42-65:⚠️ Potential issue | 🟡 MinorThis test still doesn’t verify preloading actually ran.
fresh_doi1on Line 62 is a new instance withpreloaded_events = nil, so Lines 63-64 validate DB-backed associations, notEventsPreloaderusage inimport_in_bulk.🧪 Suggested fix
- it "should preload events for all DOIs in batch" do + it "should invoke EventsPreloader during import_in_bulk" do # Create events for the DOIs create(:event_for_crossref, { subj_id: "https://doi.org/#{doi1.doi}", obj_id: "https://doi.org/#{doi2.doi}", relation_type_id: "references", }) create(:event_for_datacite_crossref, { subj_id: "https://doi.org/#{doi1.doi}", obj_id: "https://doi.org/#{doi3.doi}", relation_type_id: "is-referenced-by", }) - + preloader = instance_double(EventsPreloader, preload!: nil) + allow(EventsPreloader).to receive(:new).and_return(preloader) allow(DataciteDoi).to receive(:upload_to_elasticsearch) - DataciteDoi.import_in_bulk(ids) - - # Verify that events were preloaded (check via a fresh query) - fresh_doi1 = DataciteDoi.find(doi1.id) - expect(fresh_doi1.reference_events.count).to eq(1) - expect(fresh_doi1.citation_events.count).to eq(1) + expect(EventsPreloader).to have_received(:new) + expect(preloader).to have_received(:preload!) end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 42 - 65, The test currently checks DB-backed associations on a fresh instance (fresh_doi1) which doesn't prove EventsPreloader ran; change the assertion to either spy on the preloader or check the original in-memory instance after import. Specifically, in the test for DataciteDoi.import_in_bulk, add an expectation that EventsPreloader.preload (or the actual preloader class/method used inside DataciteDoi.import_in_bulk) receives the expected DOIs, or after calling DataciteDoi.import_in_bulk(ids) assert that doi1.association(:reference_events).loaded? and doi1.association(:citation_events).loaded? are true (or use expect(EventsPreloader).to have_received(:preload).with(an_array_including(doi1.id))). This targets DataciteDoi.import_in_bulk and the EventsPreloader invocation so the test verifies preloading ran rather than just DB counts on a fresh record.
🧹 Nitpick comments (3)
spec/models/preloaded_event_relation_spec.rb (1)
81-85: Add#injectsymbol-form coverage to match the varargs change.Line 83 only exercises block form. Add symbol-arg cases to lock behavior for
inject(:+)andinject(initial, :+).✅ Suggested spec extension
describe "#inject" do it "reduces events" do result = relation.inject(0) { |sum, e| sum + e.total } expect(result).to eq(60) end + + it "supports symbol form" do + expect(relation.pluck(:total).inject(:+)).to eq(60) + expect(relation.pluck(:total).inject(5, :+)).to eq(65) + end end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/preloaded_event_relation_spec.rb` around lines 81 - 85, Add symbol-form test coverage for the `#inject` method: in spec/models/preloaded_event_relation_spec.rb extend the describe "#inject" examples to include cases calling relation.inject(:+) and relation.inject(10, :+) (or another nonzero initial value) and assert their results match the block form (e.g., sum of totals and total+initial). Locate the existing examples that use relation.inject(0) { |sum, e| ... } and add two new it blocks (or combine assertions) to verify symbol-arg behavior for inject and inject with an initial value.spec/services/events_preloader_spec.rb (1)
71-77: Relax the strict query-count assertion to avoid brittle failures.Given the design allows 1–2 preload queries, asserting
Line 77as<= 1can fail valid implementations.🔧 Suggested change
- expect { - preloader.preload! - }.not_to exceed_query_limit(1) + expect { + preloader.preload! + }.not_to exceed_query_limit(2)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/services/events_preloader_spec.rb` around lines 71 - 77, The test's strict DB query assertion is brittle; update the expectation in spec/services/events_preloader_spec.rb to allow up to two queries by changing the matcher argument from exceed_query_limit(1) to exceed_query_limit(2) (the case around the EventsPreloader.new(dois) and preloader.preload! call).app/models/doi.rb (1)
108-170: Usingsuperin fallback paths is a semantic improvement but not required—current code is functionally correct.The event methods (
view_events,download_events, etc.) currently returnassociation(...).scopein the fallback path, which successfully enables both preloaded and database-backed scenarios. Tests validate that both code paths work correctly with.to_aenumeration. While usingsuperwould preserve the native association reader semantics, the functional behavior is already sound and no proxy-specific APIs are used anywhere in the codebase.This refactor is optional for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/doi.rb` around lines 108 - 170, The current event reader methods (view_events, download_events, reference_events, citation_events, part_events, part_of_events, version_events, version_of_events) fall back to association(...).scope when preloaded_events is nil; change those fallback branches to call super instead to preserve native association reader semantics—keep the preloaded path using filtered_preloaded_events(:target_relation_type_id / :source_relation_type_id, "<type>", :target_doi / :source_doi) and only replace the else branches with super so the methods delegate to the original association reader when not using preloaded_events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/concerns/preloaded_event_relation.rb`:
- Around line 19-29: The wrapper methods first, last, and count currently accept
no arguments or blocks; update them to mirror Array/Enumerable signatures by
forwarding any arguments and blocks to `@events`. Replace def first;
`@events.first`; end with def first(*args, &blk); `@events.first`(*args, &blk); end,
do the same for last (def last(*args, &blk); `@events.last`(*args, &blk); end),
and implement count to accept an optional item and/or block (def count(item =
nil, &blk); `@events.count`(item, &blk); end) so behavior matches Array/Enumerable
and ActiveRecord::Relation.
In `@app/services/events_preloader.rb`:
- Around line 22-26: In preload! change the early exits so they return self
instead of nil and preserve the final array initialization: replace the two bare
returns after checking `@dois.empty`? and doi_identifiers.empty? with returns of
self, and ensure the method still initializes `@doi_events` (e.g. via `@doi_events`
||= []) on the early-exit path so the final array initialization behavior
matches the normal path; update references inside preload! (the `@dois` check,
doi_identifiers variable and final `@doi_events` initialization) accordingly.
- Around line 29-35: The current loop over doi_identifiers slices (using
CHUNK_SIZE) concatenates each chunk's query results via
all_events.concat(events), which can produce duplicate Event objects when the
same Event matches DOIs across multiple chunks; change the logic in the loop
that builds all_events (the block using doi_identifiers.each_slice, Event.where,
and all_events.concat) to deduplicate by a stable identifier (e.g., Event#id)
before adding to all_events—either by tracking seen IDs in a Set and only
pushing new events, or by merging and uniquing by id after the loop—so each
Event appears only once.
---
Duplicate comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 42-65: The test currently checks DB-backed associations on a fresh
instance (fresh_doi1) which doesn't prove EventsPreloader ran; change the
assertion to either spy on the preloader or check the original in-memory
instance after import. Specifically, in the test for DataciteDoi.import_in_bulk,
add an expectation that EventsPreloader.preload (or the actual preloader
class/method used inside DataciteDoi.import_in_bulk) receives the expected DOIs,
or after calling DataciteDoi.import_in_bulk(ids) assert that
doi1.association(:reference_events).loaded? and
doi1.association(:citation_events).loaded? are true (or use
expect(EventsPreloader).to
have_received(:preload).with(an_array_including(doi1.id))). This targets
DataciteDoi.import_in_bulk and the EventsPreloader invocation so the test
verifies preloading ran rather than just DB counts on a fresh record.
---
Nitpick comments:
In `@app/models/doi.rb`:
- Around line 108-170: The current event reader methods (view_events,
download_events, reference_events, citation_events, part_events, part_of_events,
version_events, version_of_events) fall back to association(...).scope when
preloaded_events is nil; change those fallback branches to call super instead to
preserve native association reader semantics—keep the preloaded path using
filtered_preloaded_events(:target_relation_type_id / :source_relation_type_id,
"<type>", :target_doi / :source_doi) and only replace the else branches with
super so the methods delegate to the original association reader when not using
preloaded_events.
In `@spec/models/preloaded_event_relation_spec.rb`:
- Around line 81-85: Add symbol-form test coverage for the `#inject` method: in
spec/models/preloaded_event_relation_spec.rb extend the describe "#inject"
examples to include cases calling relation.inject(:+) and relation.inject(10,
:+) (or another nonzero initial value) and assert their results match the block
form (e.g., sum of totals and total+initial). Locate the existing examples that
use relation.inject(0) { |sum, e| ... } and add two new it blocks (or combine
assertions) to verify symbol-arg behavior for inject and inject with an initial
value.
In `@spec/services/events_preloader_spec.rb`:
- Around line 71-77: The test's strict DB query assertion is brittle; update the
expectation in spec/services/events_preloader_spec.rb to allow up to two queries
by changing the matcher argument from exceed_query_limit(1) to
exceed_query_limit(2) (the case around the EventsPreloader.new(dois) and
preloader.preload! call).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2c11f7fd-2f48-491e-af01-c2f1f44b9473
📒 Files selected for processing (11)
app/controllers/datacite_dois_controller.rbapp/models/concerns/preloaded_event_relation.rbapp/models/datacite_doi.rbapp/models/doi.rbapp/models/other_doi.rbapp/models/reference_repository.rbapp/services/events_preloader.rbspec/models/doi_related_spec.rbspec/models/preloaded_event_relation_spec.rbspec/requests/repositories_spec.rbspec/services/events_preloader_spec.rb
| def first | ||
| @events.first | ||
| end | ||
|
|
||
| def last | ||
| @events.last | ||
| end | ||
|
|
||
| def count | ||
| @events.count | ||
| end |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find potential call sites that pass args/blocks to these methods.
# Expected: Any matches indicate current zero-arity methods are risky.
rg -nP '\b(view_events|download_events|reference_events|citation_events|part_events|part_of_events|version_events|version_of_events)\.(first|last)\s*\(\s*[^)]'
rg -nP '\b(view_events|download_events|reference_events|citation_events|part_events|part_of_events|version_events|version_of_events)\.count\s*\{'Repository: datacite/lupo
Length of output: 39
🏁 Script executed:
# Read the full implementation file
cat -n app/models/concerns/preloaded_event_relation.rbRepository: datacite/lupo
Length of output: 2619
🏁 Script executed:
# Check if `@events` is an Array or something else
rg -A 5 -B 5 '@events\s*=' app/models/concerns/preloaded_event_relation.rbRepository: datacite/lupo
Length of output: 313
🏁 Script executed:
# Look for all method definitions to understand the pattern
rg 'def ' app/models/concerns/preloaded_event_relation.rbRepository: datacite/lupo
Length of output: 377
🏁 Script executed:
# Check if this concern is included anywhere
rg 'PreloadedEventRelation|preloaded_event_relation' --type rubyRepository: datacite/lupo
Length of output: 1875
🏁 Script executed:
# Check how PreloadedEventRelation is used in doi.rb
rg -A 10 -B 10 'PreloadedEventRelation' app/models/doi.rbRepository: datacite/lupo
Length of output: 802
🏁 Script executed:
# Check the test file to see if args are tested
cat -n spec/models/preloaded_event_relation_spec.rbRepository: datacite/lupo
Length of output: 5202
🏁 Script executed:
# Check if Array#first, Array#last, Array#count can accept args in Ruby
# Verify their actual signatures in the standard library
python3 << 'EOF'
# Check Ruby's Array method signatures
import subprocess
result = subprocess.run(['ruby', '-e', '''
puts "Array#first signature:"
puts Array.method(:first).arity rescue puts "N/A"
arr = [1, 2, 3, 4, 5]
puts "arr.first(2): #{arr.first(2).inspect}"
puts "arr.last(2): #{arr.last(2).inspect}"
puts "arr.count: #{arr.count}"
puts "arr.count { |x| x > 2 }: #{arr.count { |x| x > 2 }}"
'''], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
print("STDERR:", result.stderr)
EOFRepository: datacite/lupo
Length of output: 165
Match first/last/count arity with Enumerable and Array behavior.
The current zero-arity signatures reject valid arguments and blocks that Array and Enumerable support. Ruby's Array methods accept optional parameters: first(n), last(n), and count { |item| ... }. As a drop-in replacement for ActiveRecord::Relation, this wrapper must support the same signatures to maintain compatibility.
🐛 Fix
- def first
- `@events.first`
+ def first(*args)
+ `@events.first`(*args)
end
- def last
- `@events.last`
+ def last(*args)
+ `@events.last`(*args)
end
- def count
- `@events.count`
+ def count(*args, &block)
+ `@events.count`(*args, &block)
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def first | |
| @events.first | |
| end | |
| def last | |
| @events.last | |
| end | |
| def count | |
| @events.count | |
| end | |
| def first(*args) | |
| `@events.first`(*args) | |
| end | |
| def last(*args) | |
| `@events.last`(*args) | |
| end | |
| def count(*args, &block) | |
| `@events.count`(*args, &block) | |
| end |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/concerns/preloaded_event_relation.rb` around lines 19 - 29, The
wrapper methods first, last, and count currently accept no arguments or blocks;
update them to mirror Array/Enumerable signatures by forwarding any arguments
and blocks to `@events`. Replace def first; `@events.first`; end with def
first(*args, &blk); `@events.first`(*args, &blk); end, do the same for last (def
last(*args, &blk); `@events.last`(*args, &blk); end), and implement count to
accept an optional item and/or block (def count(item = nil, &blk);
`@events.count`(item, &blk); end) so behavior matches Array/Enumerable and
ActiveRecord::Relation.
| return if @dois.empty? | ||
|
|
||
| doi_identifiers = @dois.filter_map { |doi| doi.doi&.upcase }.uniq | ||
| return if doi_identifiers.empty? | ||
|
|
There was a problem hiding this comment.
Keep preload! return behavior consistent on early exits.
Line 22 and Line 25 return nil, but the normal path returns self. This also skips the final array initialization path for all-blank DOI inputs.
🛠️ Proposed fix
def preload!
- return if `@dois.empty`?
+ return self if `@dois.empty`?
doi_identifiers = `@dois.filter_map` { |doi| doi.doi&.upcase }.uniq
- return if doi_identifiers.empty?
+ if doi_identifiers.empty?
+ `@dois.each` { |doi| doi.preloaded_events ||= [] }
+ return self
+ end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return if @dois.empty? | |
| doi_identifiers = @dois.filter_map { |doi| doi.doi&.upcase }.uniq | |
| return if doi_identifiers.empty? | |
| return self if `@dois.empty`? | |
| doi_identifiers = `@dois.filter_map` { |doi| doi.doi&.upcase }.uniq | |
| if doi_identifiers.empty? | |
| `@dois.each` { |doi| doi.preloaded_events ||= [] } | |
| return self | |
| end |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/services/events_preloader.rb` around lines 22 - 26, In preload! change
the early exits so they return self instead of nil and preserve the final array
initialization: replace the two bare returns after checking `@dois.empty`? and
doi_identifiers.empty? with returns of self, and ensure the method still
initializes `@doi_events` (e.g. via `@doi_events` ||= []) on the early-exit path so
the final array initialization behavior matches the normal path; update
references inside preload! (the `@dois` check, doi_identifiers variable and final
`@doi_events` initialization) accordingly.
…ch in import_in_bulk spec
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
spec/models/doi_related_spec.rb (2)
32-40: Consider documenting the query budget breakdown.The query limit of 6 is reasonable for a ceiling test, but the comment only partially explains the expected queries. If associations or implementation change, this test may silently pass with different query patterns or unexpectedly fail.
Consider documenting the expected breakdown more explicitly (e.g., "1 DOI fetch + 1 EventsPreloader query + 4 association loads") or extracting the limit to a named constant if this threshold is referenced elsewhere.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 32 - 40, The test using DataciteDoi.import_in_bulk currently asserts a query cap of 6 but the comment is vague; update the example to document the expected query budget breakdown (e.g., "1 DOI fetch + 1 EventsPreloader query + 4 association loads") and extract the numeric limit into a clearly named local constant or let (e.g., EXPECTED_QUERY_LIMIT or expected_query_limit) used in the expect { ... }.not_to exceed_query_limit(...) call so the intent is explicit and reusable, and update the comment to reference the named constant and the specific components (DOIs, EventsPreloader, client, media, metadata, allocator) to make future changes clearer.
65-79: Test validates empty-events path but could be more comprehensive.This test verifies that
as_indexed_jsonmakes no additional queries when preloaded_events is set, but since no events are created, it only tests the empty-events scenario. The in-memory filtering path with actual event data isn't exercised here.Consider adding events to the DOI fixture to verify that event data is correctly retrieved from
preloaded_eventswithout hitting the database:🧪 Suggested enhancement
describe ".as_indexed_json" do let(:client) { create(:client) } let(:doi) { create(:doi, client: client, aasm_state: "findable") } + let!(:view_event) { create(:event_for_datacite_investigations, obj_id: "https://doi.org/#{doi.doi}", relation_type_id: "unique-dataset-investigations-regular", total: 25) } it "uses preloaded_events-backed relations for event metrics" do🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The test only covers the empty-events case; update the spec to also create one or more Event records associated with the DOI fixture so the in-memory filtering path is exercised: build the relation with DataciteDoi.where(...).includes(...), preload events via EventsPreloader.new(dois.to_a).preload!, assert dois.first.preloaded_events is present, then call dois.first.as_indexed_json and assert it does not exceed_query_limit(0) while also asserting the indexed JSON contains the expected event data (derived from the created Event records) to ensure as_indexed_json reads from preloaded_events rather than hitting the DB.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@spec/models/doi_related_spec.rb`:
- Line 80: Rename the example description string in the spec "it" block that
currently reads "should make few db call" to the correct plural form "should
make few db calls" in spec/models/doi_related_spec.rb (the `it` example
containing that title) so the test name uses proper grammar.
---
Nitpick comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 32-40: The test using DataciteDoi.import_in_bulk currently asserts
a query cap of 6 but the comment is vague; update the example to document the
expected query budget breakdown (e.g., "1 DOI fetch + 1 EventsPreloader query +
4 association loads") and extract the numeric limit into a clearly named local
constant or let (e.g., EXPECTED_QUERY_LIMIT or expected_query_limit) used in the
expect { ... }.not_to exceed_query_limit(...) call so the intent is explicit and
reusable, and update the comment to reference the named constant and the
specific components (DOIs, EventsPreloader, client, media, metadata, allocator)
to make future changes clearer.
- Around line 65-79: The test only covers the empty-events case; update the spec
to also create one or more Event records associated with the DOI fixture so the
in-memory filtering path is exercised: build the relation with
DataciteDoi.where(...).includes(...), preload events via
EventsPreloader.new(dois.to_a).preload!, assert dois.first.preloaded_events is
present, then call dois.first.as_indexed_json and assert it does not
exceed_query_limit(0) while also asserting the indexed JSON contains the
expected event data (derived from the created Event records) to ensure
as_indexed_json reads from preloaded_events rather than hitting the DB.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4504c1d7-3df4-438d-bd1b-509456292890
📒 Files selected for processing (1)
spec/models/doi_related_spec.rb
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/other_doi.rb`:
- Around line 150-154: The includes hash in the OtherDoi query (the
selected_dois assignment using OtherDoi.where(...).includes) violates RuboCop's
Layout/SpaceInsideHashLiteralBraces; update the hash literal spacing to add
spaces inside the braces (e.g., change {client: :provider} to have a space after
{ and before }) so the includes call reads with properly spaced braces.
In `@spec/models/doi_related_spec.rb`:
- Around line 65-79: The test currently only exercises the empty-preload path;
modify the example to create one or more real event records associated with the
doi (using the existing event factory/model), then build the relation
(DataciteDoi.where(...).includes(...)), run
EventsPreloader.new(dois.to_a).preload!, and call dois.first.as_indexed_json to
assert that (1) it does not issue extra queries and (2) computed metrics
returned from as_indexed_json match expected values derived from the created
events (reference DataciteDoi, EventsPreloader, preloaded_events,
as_indexed_json and any event factory/model names to locate code).
- Line 68: The RuboCop brace-spacing offense is in the includes call on the
DataciteDoi query (variable dois); change the hash spacing inside the braces so
it reads includes({ client: :provider }, :media, :metadata) instead of
includes({client: :provider}, :media, :metadata) to satisfy the style rule.
- Around line 81-96: The spec is not using the production eager-load graph so it
misses the provider preload; update the query in
DataciteDoi.where(...).includes(...) to match the production preload used by
import_in_bulk (e.g. include client: :provider along with :media and :metadata),
so EventsPreloader.new(dois.to_a).preload! and the call to
dois.first.as_indexed_json exercise the same eager-loading path and the
expect(...).not_to exceed_query_limit(13) accurately catches regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 03c1b369-232d-4c4d-94bd-57d56424c29e
📒 Files selected for processing (3)
app/models/datacite_doi.rbapp/models/other_doi.rbspec/models/doi_related_spec.rb
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/datacite_doi.rb
| it "uses preloaded_events-backed relations for event metrics" do | ||
| allow(DataciteDoi).to receive(:upload_to_elasticsearch) | ||
| # Build the relation with includes, as in production | ||
| dois = DataciteDoi.where(id: doi.id).includes({client: :provider}, :media, :metadata) | ||
| # Preload events explicitly | ||
| preloader = EventsPreloader.new(dois.to_a) | ||
| preloader.preload! | ||
| # Sanity: events are preloaded on the instance we’ll index | ||
| expect(dois.first.preloaded_events).not_to be_nil | ||
| # When we call as_indexed_json, the overridden *_events methods should | ||
| # hit the preloaded array, not issue extra queries for events. | ||
| expect { | ||
| dois.first.as_indexed_json | ||
| }.not_to exceed_query_limit(0) # if all associations were pre-included | ||
| end |
There was a problem hiding this comment.
This only covers the empty-preload path.
No events are created here, so the example proves zero extra queries for preloaded_events = [] but not that as_indexed_json computes real metrics correctly from populated PreloadedEventRelations.
🧰 Tools
🪛 GitHub Actions: Test Pull Request
[error] 68-68: RuboCop: Layout/SpaceInsideHashLiteralBraces: Space inside { missing.
[error] 68-68: RuboCop: Layout/SpaceInsideHashLiteralBraces: Space inside } missing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The test currently
only exercises the empty-preload path; modify the example to create one or more
real event records associated with the doi (using the existing event
factory/model), then build the relation (DataciteDoi.where(...).includes(...)),
run EventsPreloader.new(dois.to_a).preload!, and call dois.first.as_indexed_json
to assert that (1) it does not issue extra queries and (2) computed metrics
returned from as_indexed_json match expected values derived from the created
events (reference DataciteDoi, EventsPreloader, preloaded_events,
as_indexed_json and any event factory/model names to locate code).
| it "should make few db calls" do | ||
| allow(DataciteDoi).to receive(:upload_to_elasticsearch) | ||
| dois = DataciteDoi.where(id: doi.id).includes( | ||
| :client, | ||
| :media, | ||
| :view_events, | ||
| :download_events, | ||
| :citation_events, | ||
| :reference_events, | ||
| :part_events, | ||
| :part_of_events, | ||
| :version_events, | ||
| :version_of_events, | ||
| :metadata | ||
| ) | ||
|
|
||
| # Preload events to avoid N+1 queries | ||
| EventsPreloader.new(dois.to_a).preload! | ||
|
|
||
| # Test the maximum number of queries made by the method | ||
| # With EventsPreloader, we should have fewer queries | ||
| expect { | ||
| dois.first.as_indexed_json | ||
| }.not_to exceed_query_limit(13) |
There was a problem hiding this comment.
Use the production eager-load graph in this query-budget spec.
import_in_bulk now preloads { client: :provider }, but this example still only includes :client. That leaves the provider query unaccounted for and makes the 13-query ceiling too loose to catch regressions in the new path.
🧪 Suggested update
dois = DataciteDoi.where(id: doi.id).includes(
- :client,
+ { client: :provider },
:media,
:metadata
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@spec/models/doi_related_spec.rb` around lines 81 - 96, The spec is not using
the production eager-load graph so it misses the provider preload; update the
query in DataciteDoi.where(...).includes(...) to match the production preload
used by import_in_bulk (e.g. include client: :provider along with :media and
:metadata), so EventsPreloader.new(dois.to_a).preload! and the call to
dois.first.as_indexed_json exercise the same eager-loading path and the
expect(...).not_to exceed_query_limit(13) accurately catches regressions.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
spec/models/doi_related_spec.rb (2)
83-87:⚠️ Potential issue | 🟡 MinorUse the production include graph in this query-budget spec.
Line 84 still uses
:client, but the production path now includes{ client: :provider }. Keeping the spec aligned is important so provider-related regressions are caught.Suggested fix
dois = DataciteDoi.where(id: doi.id).includes( - :client, + { client: :provider }, :media, :metadata )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 83 - 87, Update the ActiveRecord include graph used when loading DataciteDoi records in the test: change the includes call on the `dois = DataciteDoi.where(id: doi.id).includes(:client, :media, :metadata)` expression to use the production shape by replacing `:client` with a nested association `client: :provider` (i.e., `.includes(client: :provider, :media, :metadata)`) so provider associations are eagerly loaded in the spec.
65-79:⚠️ Potential issue | 🟡 MinorPopulate real events before asserting preloaded metric behavior.
At Line 70 preloading runs, but this example doesn't create events, so it only validates the empty-preload path. Add event fixtures and assert metric outputs from preloaded data to protect the non-empty path.
Suggested test adjustment
it "uses preloaded_events-backed relations for event metrics" do allow(DataciteDoi).to receive(:upload_to_elasticsearch) + target = create(:doi, client: client, aasm_state: "findable") + source = create(:doi, client: client, aasm_state: "findable") + create(:event_for_crossref, + subj_id: "https://doi.org/#{doi.doi}", + obj_id: "https://doi.org/#{target.doi}", + relation_type_id: "references") + create(:event_for_datacite_crossref, + subj_id: "https://doi.org/#{doi.doi}", + obj_id: "https://doi.org/#{source.doi}", + relation_type_id: "is-referenced-by") + dois = DataciteDoi.where(id: doi.id).includes({ client: :provider }, :media, :metadata) preloader = EventsPreloader.new(dois.to_a) preloader.preload! expect(dois.first.preloaded_events).not_to be_nil expect { - dois.first.as_indexed_json + dois.first.as_indexed_json + expect(dois.first.reference_count).to eq(1) + expect(dois.first.citation_count).to eq(1) }.not_to exceed_query_limit(0) end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The test for preloaded event metrics (using DataciteDoi, EventsPreloader, preloaded_events and as_indexed_json) only preloads an empty events set; create and attach real event fixtures to the DOI before calling EventsPreloader.preload! (e.g., create one or more Event/Metric records associated with doi), re-run preloader, then assert that the metric fields returned by dois.first.as_indexed_json reflect those populated events and still do not perform extra queries (retain the expect { ... }.not_to exceed_query_limit(0)); this ensures the non-empty path is validated and metrics are asserted against preloaded data.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 83-87: Update the ActiveRecord include graph used when loading
DataciteDoi records in the test: change the includes call on the `dois =
DataciteDoi.where(id: doi.id).includes(:client, :media, :metadata)` expression
to use the production shape by replacing `:client` with a nested association
`client: :provider` (i.e., `.includes(client: :provider, :media, :metadata)`) so
provider associations are eagerly loaded in the spec.
- Around line 65-79: The test for preloaded event metrics (using DataciteDoi,
EventsPreloader, preloaded_events and as_indexed_json) only preloads an empty
events set; create and attach real event fixtures to the DOI before calling
EventsPreloader.preload! (e.g., create one or more Event/Metric records
associated with doi), re-run preloader, then assert that the metric fields
returned by dois.first.as_indexed_json reflect those populated events and still
do not perform extra queries (retain the expect { ... }.not_to
exceed_query_limit(0)); this ensures the non-empty path is validated and metrics
are asserted against preloaded data.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 54b253de-906d-4adf-b596-0676a1beba23
📒 Files selected for processing (3)
app/models/datacite_doi.rbapp/models/other_doi.rbspec/models/doi_related_spec.rb
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/other_doi.rb
There was a problem hiding this comment.
♻️ Duplicate comments (2)
spec/models/doi_related_spec.rb (2)
81-87:⚠️ Potential issue | 🟡 MinorUse the production eager-load graph in this query-budget spec.
Line 84 still uses
:client, whileimport_in_bulknow preloads{ client: :provider }. This makes the query budget less representative of production behavior.Suggested fix
- dois = DataciteDoi.where(id: doi.id).includes( - :client, + dois = DataciteDoi.where(id: doi.id).includes( + { client: :provider }, :media, :metadata )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 81 - 87, The spec's eager-loading doesn't match production: replace the includes(...) call in the "should make few db calls" example (the DataciteDoi.where(...).includes call) to use the same production preload graph used by import_in_bulk by swapping :client for a nested preload { client: :provider } (keep :media and :metadata as-is) so the query-budget spec reflects real-world eager loading.
65-79:⚠️ Potential issue | 🟡 MinorThis example still validates the empty-preload path only.
No events are created in this spec, so it proves query behavior for
preloaded_events = [], not correctness of populated event metrics duringas_indexed_json. Add at least one reference/citation fixture and assert metric values from the indexed payload.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The spec currently only verifies the empty-preload path; update the test for DataciteDoi#as_indexed_json and EventsPreloader to create at least one event-related fixture (e.g., a Reference/Citation and associated Event model) linked to the doi, run preloader (EventsPreloader.new(dois.to_a).preload!), call dois.first.as_indexed_json and add assertions that the returned payload contains the expected event metrics (counts/timestamps/fields) for that created event, while keeping the existing expect { dois.first.as_indexed_json }.not_to exceed_query_limit(0) to ensure no extra queries.
🧹 Nitpick comments (1)
spec/models/doi_related_spec.rb (1)
52-55: Assertion is stricter than the comment implies.Line 54 says IDs may be a subset for multiple batches, but
match_array(ids)requires the full set. Consider either passing an explicit single-batchbatch_sizein this spec or updating the matcher to accept per-batch subsets.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/models/doi_related_spec.rb` around lines 52 - 55, The test currently asserts EventsPreloader.new receives a full set of IDs (expect(dois_arg.map(&:id)).to match_array(ids)) but the comment warns that multiple batches may produce only a subset; fix by either forcing a single-batch scenario (invoke the code under test or stub/configure the batch size so batch_size >= ids.length) or relax the assertion to accept per-batch subsets (e.g., change the matcher to assert that dois_arg.map(&:id) is a subset of ids or uses include matcher). Update the spec around EventsPreloader and DataciteDoi accordingly and reference the ids and batch_size configuration used by the code under test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 81-87: The spec's eager-loading doesn't match production: replace
the includes(...) call in the "should make few db calls" example (the
DataciteDoi.where(...).includes call) to use the same production preload graph
used by import_in_bulk by swapping :client for a nested preload { client:
:provider } (keep :media and :metadata as-is) so the query-budget spec reflects
real-world eager loading.
- Around line 65-79: The spec currently only verifies the empty-preload path;
update the test for DataciteDoi#as_indexed_json and EventsPreloader to create at
least one event-related fixture (e.g., a Reference/Citation and associated Event
model) linked to the doi, run preloader
(EventsPreloader.new(dois.to_a).preload!), call dois.first.as_indexed_json and
add assertions that the returned payload contains the expected event metrics
(counts/timestamps/fields) for that created event, while keeping the existing
expect { dois.first.as_indexed_json }.not_to exceed_query_limit(0) to ensure no
extra queries.
---
Nitpick comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 52-55: The test currently asserts EventsPreloader.new receives a
full set of IDs (expect(dois_arg.map(&:id)).to match_array(ids)) but the comment
warns that multiple batches may produce only a subset; fix by either forcing a
single-batch scenario (invoke the code under test or stub/configure the batch
size so batch_size >= ids.length) or relax the assertion to accept per-batch
subsets (e.g., change the matcher to assert that dois_arg.map(&:id) is a subset
of ids or uses include matcher). Update the spec around EventsPreloader and
DataciteDoi accordingly and reference the ids and batch_size configuration used
by the code under test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2da26e76-40ec-4430-8bfc-62520fa068aa
📒 Files selected for processing (3)
app/models/datacite_doi.rbapp/models/other_doi.rbspec/models/doi_related_spec.rb
Purpose
This PR addresses N+1 query performance issues when fetching or importing DOIs that have associated event data (citations, views, downloads, references, etc.). Previously, each relationship type for every DOI in a collection triggered a separate database query.
Approach
The optimization introduces a "bulk-preload" strategy for events:
EventsPreloaderService: A new service that fetches all relevant events for a batch of DOIs in just 1 or 2 queries.PreloadedEventRelationConcern: A wrapper class that mimics theActiveRecord::RelationAPI (supportingpluck,map,select, etc.), allowing existing code to operate on in-memory arrays as if they were database relations.Doimodel’s association methods (e.g.,view_events,citation_events) are overridden to check ifpreloaded_eventsis present. If so, they filter the in-memory array; otherwise, they fall back to the standard database scope.Key Modifications
app/services/events_preloader.rb: New service to batch-fetch events based onsource_doiortarget_doi.app/models/concerns/preloaded_event_relation.rb: New wrapper for in-memory event collections to ensure backward compatibility with relation-based methods.app/models/doi.rb: Addedpreloaded_eventsaccessor and logic to switch between preloaded data and real-time database queries.app/controllers/datacite_dois_controller.rb: Integrated preloading for the:detailview to speed up serialization.app/models/datacite_doi.rb&app/models/other_doi.rb: Integrated preloading intoimport_in_bulkto optimize Elasticsearch indexing processes.Important Technical Details
CHUNK_SIZE = 1000)..upcaseto ensure consistency between database records and memory mapping.preloaded_eventsis never initialized, the application behaves exactly as before, ensuring safety for code paths not yet optimized.Types of changes
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Reviewer, please remember our guidelines:
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests