SQLAlchemy: Polyfill for `AUTOINCREMENT` columns

## About

CrateDB does not support the notion of `autoincrement` columns, because it is a distributed database. For supporting certain applications, specific workarounds have been showing a successful outcome. Hereby, we would like to evaluate how a corresponding patch could be generalized, to be optionally enabled on the CrateDB SQLAlchemy dialect.

## Reference

- https://github.com/crate/crate/issues/11020

## Details

CrateDB can generate unique values server-side by using the [GEN_RANDOM_TEXT_UUID()](https://crate.io/docs/crate/reference/en/5.4/general/builtins/scalar-functions.html#gen-random-text-uuid) function, which is applicable for `TEXT` types.

```sql
"columnname" TEXT DEFAULT GEN_RANDOM_TEXT_UUID() NOT NULL
```

While working on [mlflow-cratedb](https://github.com/crate-workbench/mlflow-cratedb), we had the need to create unique Integer-based values, so we added a [corresponding monkeypatch](https://github.com/crate-workbench/mlflow-cratedb/blob/v0.1.1/mlflow_cratedb/patch/crate_python.py#L7-L28), which also has a remark:

> TODO: Submit patch to `crate-python`, to be enabled by a dialect parameter `crate_polyfill_autoincrement` or such.

## Proposal

So, the idea here is to add a dialect parameter `crate_polyfill_autoincrement`, which will, under the hood, transparently augment SQLAlchemy models to swap in a different procedure, depending on its column type. For text-based columns, the server-side `DEFAULT GEN_RANDOM_TEXT_UUID()` may be applicable, while for integer-based columns, the patching would need to resort to a timestamp [^1][^2].

## Followup

If that turns out well, make sure to report the outcome back to the original ticket https://github.com/crate/crate/issues/11020, and also write a "best practice" community post about it.


[^1]: I don't know yet how or whether the timestamp resolution should be configurable at all. It would probably be best to always use nanosecond resolution, in order to reduce the chance of collisions in high-traffic/-volume/-performance environments. If that is not an issue, millisecond resolution might be enough, but making it configurable would probably be a mess.
[^2]: For the column to be able to store large integer numbers like Unix time since Epoch in nanoseconds, it would need to be converged into a BIGINT, if it was defined as an INT only.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SQLAlchemy: Polyfill for `AUTOINCREMENT` columns #77

About

Reference

Details

Proposal

Followup

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SQLAlchemy: Polyfill for AUTOINCREMENT columns #77

Description

About

Reference

Details

Proposal

Followup

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

SQLAlchemy: Polyfill for `AUTOINCREMENT` columns #77