Description
About
CrateDB does not support the notion of autoincrement
columns, because it is a distributed database. For supporting certain applications, specific workarounds have been showing a successful outcome. Hereby, we would like to evaluate how a corresponding patch could be generalized, to be optionally enabled on the CrateDB SQLAlchemy dialect.
Reference
Details
CrateDB can generate unique values server-side by using the GEN_RANDOM_TEXT_UUID() function, which is applicable for TEXT
types.
"columnname" TEXT DEFAULT GEN_RANDOM_TEXT_UUID() NOT NULL
While working on mlflow-cratedb, we had the need to create unique Integer-based values, so we added a corresponding monkeypatch, which also has a remark:
TODO: Submit patch to
crate-python
, to be enabled by a dialect parametercrate_polyfill_autoincrement
or such.
Proposal
So, the idea here is to add a dialect parameter crate_polyfill_autoincrement
, which will, under the hood, transparently augment SQLAlchemy models to swap in a different procedure, depending on its column type. For text-based columns, the server-side DEFAULT GEN_RANDOM_TEXT_UUID()
may be applicable, while for integer-based columns, the patching would need to resort to a timestamp 12.
Followup
If that turns out well, make sure to report the outcome back to the original ticket crate/crate#11020, and also write a "best practice" community post about it.
Footnotes
-
I don't know yet how or whether the timestamp resolution should be configurable at all. It would probably be best to always use nanosecond resolution, in order to reduce the chance of collisions in high-traffic/-volume/-performance environments. If that is not an issue, millisecond resolution might be enough, but making it configurable would probably be a mess. ↩
-
For the column to be able to store large integer numbers like Unix time since Epoch in nanoseconds, it would need to be converged into a BIGINT, if it was defined as an INT only. ↩