-
Notifications
You must be signed in to change notification settings - Fork 11
Feature/lxl 4134 #1583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feature/lxl 4134 #1583
Conversation
…fortunately) be tolerated. Other errors will still be retried in order.
@@ -1624,24 +1658,21 @@ class PostgreSQLComponent { | |||
boolean saveVersion(Document doc, Connection connection, Date createdTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be changed to void
?
@@ -505,7 +385,7 @@ class Whelk { | |||
*/ | |||
boolean storeAtomicUpdate(String id, boolean minorUpdate, boolean writeIdenticalVersions, String changedIn, String changedBy, UpdateAgent updateAgent) { | |||
Document preUpdateDoc = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this variable can be removed?
@@ -524,13 +403,12 @@ class Whelk { | |||
void storeAtomicUpdate(Document doc, boolean minorUpdate, boolean writeIdenticalVersions, String changedIn, String changedBy, String oldChecksum) { | |||
normalize(doc) | |||
Document preUpdateDoc = storage.load(doc.shortId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is unused now and can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM!
Remaining possibility for small inconsistencies:
If an elastic request fails after increment/decerement reverse links they will be bumbed again when the whole document is retried. I think this is fine for now. Lots of added complexity to fix this?
The fix would be to reindex them instead of just updating the counters the second time around?
I would like to have a gauge for "seconds behind" in /metrics so that we can track it.
I can add that.
Interesting to see the transcribed (groovy -> java) indexing code. A bit more clunky but not that bad.
It is no longer relevant as indexing is no longer a direct effect of these processes, and so there is no speedup to be had.
This changes Elastic indexing to occur strictly in order with postgres writes.
Every change is logged (in a table) with a sequence number (assigned within the postgres transaction), and indexing occurs by reading that log in the same order. There is still a tolerance for bad data, in the sense that any 400-response from Elastic will be error-logged but ignored. Other types of errors however (be they timeouts, broken connections, or whatever else) are NOT TOLERATED, and will be retried, in order (forever).
A big change for developers in this, is that the housekeeping module must now be running for any indexing to occur at all. It is however perfectly fine to start it up late (it will catch up on its own).