-
Notifications
You must be signed in to change notification settings - Fork 131
CQL Vector support #1165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CQL Vector support #1165
Conversation
|
78f489c
to
cfdf4e5
Compare
I'm not sure this is the correct way to split this PR into commits (I'm pretty sure it isn't, as the commits won't compile), however I can't think of a proper way. |
6aee097
to
440d63a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the first commit (introduction of TypeParser
)
Some general comments:
- The logic of
TypeParser
is quite complex. I suggest adding some docstrings next to the type definitions and methods. For example, I have no idea whatTypeParser::from_hex
does. Docstrings will also help a lot in the future in case some other developer touches this piece of code. - It's worth adding some comments next to the non-intuitive parts of the code. Example:
if name.is_empty() {
if !self.is_eos() {
return Err(CqlTypeParseError::AbstractTypeParseError());
}
return Ok(ColumnType::Blob);
}
It's not obvious why we return Blob
if name is empty. A link to the corresponding part of original source code would be helpful.
- Please, add some unit tests. I saw that there is some small test of
TypeParser
in a later commit. I think we should add more tests and try to handle as many parsing cases as we can. In addition, I think that in this case, unit tests should be added in the same commit (they help during review - it's easier to reason about the complex code when there are some use case examples one can look at) - This implementation is based on some existing (probably Java) implementation, correct? If so, please, provide the link to the source in the commit. Ideally, the link should be placed in the comments in code as well.
Whole TypeParser logic was ripped straight out of ScyllaDB's vector implementation, however, as it still in development and probably won't be merged for a while, it will be hard to link directly. IIRC there is a lot of tests there for this functionality, so thay also can be borrowed. |
Ok, makes sense. And let's borrow the tests in such case :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 What a great piece of code! Thank you for the contribution!
There are quite many comments, though.
I think that the new parser module needs much more unit tests.
Also, tests for particular errors upon serialization and deserialization of Vector are missing.
23d6dad
to
8e128e1
Compare
I have copied the test case and changed the expected error type, but forgot to change the test string, so the tests didn't pass, here I fix this |
@piodul (I'm not sure why, but I can't reply to your comment) The issue with invalid parameter count is that any elegant way of gathering that data that I could think of (and the one you propose here) would require allocating each time we parse a vector. The solution without allocations (getting the required arguments manually and iterating through the rest if needed) is quite ugly. |
This is a weird behavior in Github UI. If, when doing a review, you respond to a comment thread belonging to some other review, then this new comment will show up in 2 places:
|
Replied here: #1165 (comment) |
@smoczy123 Are you planning to address @piodul's nitpicks or shall we merge this as-is? |
They should be addressed now @wprzytula |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidParameterCount(usize, usize),
- can we make that a struct variant (with actual and expected fields) instead of tuple variant?
.collect::<Vec<_>>()
.try_into()
.map_err(
|v: Vec<Result<ColumnType<'result>, CustomTypeParseError>>| {
CustomTypeParseError::InvalidParameterCount(v.len(), 1)
},
)?;
I think the intention was to only allocate in the case of an error. Now you made this function always allocate. One way to allocate only in case of error is to do something like this:
fn get_complex_abstract_type(
&mut self,
mut name: &'result str,
) -> Result<ColumnType<'result>, CustomTypeParseError> {
name = name
.strip_prefix("org.apache.cassandra.db.marshal.")
.unwrap_or(name);
// Calculates the real number of parameters.
// Can be called only after verifying that get_type_parameters() returned Ok,
// because it panic in case of error.
// It is declared in this admittedly weird way to make it FnOnce - calling it
// more than once would obviously be a bug.
let calc_params_count = {
let self_clone = CustomTypeParser {
// Clone here is cheap since parser is just one &str.
parser: self.parser.clone(),
};
move || {
// If we make `self_clone` mut and used it here,
// the whole closure would be FnMut.
let mut parser = self_clone;
parser.get_type_parameters().unwrap().count()
}
};
match name {
"ListType" => {
let [element_type_result] = self
.get_type_parameters()?
.collect_array::<1>()
.ok_or_else(move || {
CustomTypeParseError::InvalidParameterCount(calc_params_count(), 1)
})?;
let element_type = element_type_result?;
Ok(ColumnType::Collection {
frozen: false,
typ: CollectionType::List(Box::new(element_type)),
})
}
"SetType" => {
let [element_type_result] = self
.get_type_parameters()?
.collect_array::<1>()
.ok_or_else(move || {
CustomTypeParseError::InvalidParameterCount(calc_params_count(), 1)
})?;
let element_type = element_type_result?;
Ok(ColumnType::Collection {
frozen: false,
typ: CollectionType::Set(Box::new(element_type)),
})
}
"MapType" => {
let [key_type_result, value_type_result] = self
.get_type_parameters()?
.collect_array::<2>()
.ok_or_else(move || {
CustomTypeParseError::InvalidParameterCount(calc_params_count(), 2)
})?;
let key_type = key_type_result?;
let value_type = value_type_result?;
Ok(ColumnType::Collection {
frozen: false,
typ: CollectionType::Map(Box::new(key_type), Box::new(value_type)),
})
}
"TupleType" => {
let params = self
.get_type_parameters()?
.collect::<Result<Vec<_>, CustomTypeParseError>>()?;
if params.is_empty() {
return Err(CustomTypeParseError::InvalidParameterCount(0, 1));
}
Ok(ColumnType::Tuple(params))
}
"VectorType" => {
let (typ, len) = self.get_vector_parameters()?;
Ok(ColumnType::Vector {
typ: Box::new(typ),
dimensions: len,
})
}
"UserType" => {
let params = self.get_udt_parameters()?;
Ok(ColumnType::UserDefinedType {
frozen: false,
definition: Arc::new(UserDefinedType {
name: params.type_name.into(),
keyspace: params.keyspace.into(),
field_types: params.field_types,
}),
})
}
name => Err(CustomTypeParseError::UnknownComplexCustomTypeName(
name.into(),
)),
}
}
Before pushing this let's get @wprzytula or @piodul 's opinion on wheter this solution makes sense.
The |
This is needed to deserialize vector metadata as it is implemented as a Custom type with VectorType as its class
Due to the fact that Cassandra implements variable type length vectors in a way that contradicts the CQL protocol, special care must be given when deserializing them as sizes of their elements are encoded as unsigned vint instead of an int
This is needed for serialization of vectors as they either don't write the size of elements or write it weirdly.
Similarly to the deserialization commit, special care must be given when serializing variable type length vectors, as sizes of their elements must be written as an unsigned varint
This PR adds serialization and deserialization of CQL Vector (as implemented in Cassandra) therefore achieving compatibility with Cassandra's Vector type. It's important to note that Cassandra implements Vector serialization and deserialization in a way that
contradicts the CQL protocol, using [unsigned vint] instead of [int] as the element size encoding for variable type length vectors.
Fixes #1014
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.