-
Notifications
You must be signed in to change notification settings - Fork 9
Support higher-volume cases (and potentially an ordering guarantee) by using pgq #139
Description
Background
We currently inside Zalando have a discussion of how to implement reliable (transactional) event sending, which is basically what this library is trying to do.
When I mentioned this library (and that we are using similar approaches in another team, where we do a nightly full vacuum), it was pointed (by @CyberDem0n):
That's actually the major problem of such homegrown solutions.
- Write amplification (you are not only inserting into the queue table, but also updating/deleting).
- Permanent table and index bloat due to the
1.- Regular heavy maintenance required due to the
2.- Maintenance always affects normal processes interacting with the events table.
- In case if the event flow is relatively high, it quickly becomes not enough to do vacuum full/reindex only once a night.
In this regard pgq is maintenance free. For every queue you create, under the hood it creates a few tables.
These tables are INSERT ONLY, therefore they are explicitly excluded from the autovacuum.
Tables are used in the round-robin matter. Since events are always processed strictly in one order it is enough only to keep the pointer to the latest row(event) that was processed and no UPDATES/DELETES required on the event table. Once all events from the specific table are processed PgQ simply does TRUNCATE on this table.
These tricks are making PgQ very scalable. Back 10 years ago, when PostgreSQL didn't yet have built-in streaming replication, the PgQ was used as a base for the logical replication, Londiste. Both solutions are developed by Mark Kreen while working for Skype. IIRC, 3 or 4 years ago Skype was still relying on PgQ and Londiste, because they just work.
@a1exsh pointed me to the pgq SQL API and promised to help with code review if we want to integrate this into this library.
Goal
Find a way of using a pgq queue instead of the current event_log table for storing the events for later Nakadi submission.
This should be optional, as not every user of this library has pgq available, or the ability to install postgresql extensions.