Skip to content

SQLAlchemy: Polyfill for transparently synchronizing data with REFRESH TABLE #83

Open
@amotl

Description

@amotl

About

Because CrateDB does not immediately flush data to disk, applications relying on that behavior will fail. This becomes immediately appearant when running the test suites of typical SQLAlchemy applications.

Recently, we started working on unlocking MLflow and LangChain, and needed to patch SQLAlchemy, adding a bit of compensation to satisfy their test cases.

Proposal

Provide corresponding functionality through a dialect parameter like crate_refresh_after_dml or crate_synchronize_all, or find a different solution to the same problem.

def polyfill_refresh_after_dml(base_model):
    """
    Run `REFRESH TABLE <tablename>` after each INSERT, UPDATE, and DELETE operation.

    CrateDB is eventually consistent, i.e. write operations are not flushed to
    disk immediately, so readers may see stale data. In a traditional OLTP-like
    application, this is not applicable.

    This SQLAlchemy extension makes sure that data is synchronized after each
    operation manipulating data.

    TODO: Submit patch to `crate-python`, to be enabled by a
          dialect parameter `crate_dml_refresh` or such.
    """
    for mapper in base_model.registry.mappers:
        listen(mapper.class_, "after_insert", do_refresh)
        listen(mapper.class_, "after_update", do_refresh)
        listen(mapper.class_, "after_delete", do_refresh)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions