Open
Description
About
Because CrateDB does not immediately flush data to disk, applications relying on that behavior will fail. This becomes immediately appearant when running the test suites of typical SQLAlchemy applications.
Recently, we started working on unlocking MLflow and LangChain, and needed to patch SQLAlchemy, adding a bit of compensation to satisfy their test cases.
Proposal
Provide corresponding functionality through a dialect parameter like crate_refresh_after_dml
or crate_synchronize_all
, or find a different solution to the same problem.
def polyfill_refresh_after_dml(base_model):
"""
Run `REFRESH TABLE <tablename>` after each INSERT, UPDATE, and DELETE operation.
CrateDB is eventually consistent, i.e. write operations are not flushed to
disk immediately, so readers may see stale data. In a traditional OLTP-like
application, this is not applicable.
This SQLAlchemy extension makes sure that data is synchronized after each
operation manipulating data.
TODO: Submit patch to `crate-python`, to be enabled by a
dialect parameter `crate_dml_refresh` or such.
"""
for mapper in base_model.registry.mappers:
listen(mapper.class_, "after_insert", do_refresh)
listen(mapper.class_, "after_update", do_refresh)
listen(mapper.class_, "after_delete", do_refresh)