Skip to content

Modernize CrateDB data type implementations #65

Open
@amotl

Description

@amotl

About

On behalf of GH-9, we added an SQLAlchemy data type implementation for CrateDB's FLOAT_VECTOR data type. On this task, we discovered a few details that made us think we should also modernize the other data type implementations, eventually.

The reason is because they have been conceived a while ago already, around SQLAlchemy 1.x, and additional mechanisms have been introduced to SQLAlchemy since then.

Implementation Notes

An improved implementation of the FloatVector data type for CrateDB,
compared to the previous implementation on behalf of the LangChain adapter.

The previous implementation, based on SQLAlchemy's UserDefinedType, didn't
respect the python_type property on backward/reverse resolution of types.
This was observed on Meltano's database connector machinery doing a
type cast, which led to a NotImplementedError.

typing.cast(type, sql_type.python_type) => NotImplementedError

The UserDefinedType approach is easier to implement, because it doesn't
need compiler support.

To get full SQLAlchemy type support, including support for forward- and
backward resolution / type casting, the custom data type should derive
from SQLAlchemy's TypeEngine base class instead.

When deriving from TypeEngine, you will need to set the __visit_name__
attribute, and add a corresponding visitor method to the CrateTypeCompiler,
in this case, visit_FLOAT_VECTOR.

Now, rendering a DDL succeeds. However, when reflecting the DDL schema back,
it doesn't work until you will establish a corresponding reverse type mapping.

By invoking SELECT DISTINCT(data_type) FROM information_schema.columns;,
you will find out that the internal type name is float_vector, so you
announce it to the dialect using TYPES_MAP["float_vector"] = FloatVector.

Still not there: NotImplementedError: Default TypeEngine.as_generic() heuristic method was unsuccessful for target_cratedb.sqlalchemy.vector.FloatVector. A custom as_generic() method must be implemented for this type class.

So, as it signals that the type implementation also needs an as_generic
property, let's supply one, returning sqltypes.ARRAY.

It looks like, in exchange to those improvements, the get_col_spec
method is not needed any longer.

TODO: Would it be a good idea to derive from SQLAlchemy's
ARRAY right away, to get a few of the features without
the need to redefine them?

Please note the outcome of this analysis and the corresponding implementation
has been derived from empirical observations, and from the feeling that we also
lack corresponding support on the other special data types of CrateDB (ARRAY and
OBJECT) within the SQLAlchemy dialect, i.e. "that something must be wrong or
incomplete". In this spirit, it is advisable to review and improve their
implementations correspondingly.

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions