Skip to content

Feature/lxl 4134 #1583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: develop
Choose a base branch
from
Draft

Feature/lxl 4134 #1583

wants to merge 18 commits into from

Conversation

jannistsiroyannis
Copy link
Contributor

This changes Elastic indexing to occur strictly in order with postgres writes.

Every change is logged (in a table) with a sequence number (assigned within the postgres transaction), and indexing occurs by reading that log in the same order. There is still a tolerance for bad data, in the sense that any 400-response from Elastic will be error-logged but ignored. Other types of errors however (be they timeouts, broken connections, or whatever else) are NOT TOLERATED, and will be retried, in order (forever).

A big change for developers in this, is that the housekeeping module must now be running for any indexing to occur at all. It is however perfectly fine to start it up late (it will catch up on its own).

@@ -1624,24 +1658,21 @@ class PostgreSQLComponent {
boolean saveVersion(Document doc, Connection connection, Date createdTime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be changed to void ?

@@ -505,7 +385,7 @@ class Whelk {
*/
boolean storeAtomicUpdate(String id, boolean minorUpdate, boolean writeIdenticalVersions, String changedIn, String changedBy, UpdateAgent updateAgent) {
Document preUpdateDoc = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this variable can be removed?

@@ -524,13 +403,12 @@ class Whelk {
void storeAtomicUpdate(Document doc, boolean minorUpdate, boolean writeIdenticalVersions, String changedIn, String changedBy, String oldChecksum) {
normalize(doc)
Document preUpdateDoc = storage.load(doc.shortId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is unused now and can be removed

Copy link
Contributor

@olovy olovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM!

Remaining possibility for small inconsistencies:
If an elastic request fails after increment/decerement reverse links they will be bumbed again when the whole document is retried. I think this is fine for now. Lots of added complexity to fix this?
The fix would be to reindex them instead of just updating the counters the second time around?

I would like to have a gauge for "seconds behind" in /metrics so that we can track it.
I can add that.

Interesting to see the transcribed (groovy -> java) indexing code. A bit more clunky but not that bad.

@olovy olovy marked this pull request as draft May 14, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants