-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CQL Vector support #1165
base: main
Are you sure you want to change the base?
CQL Vector support #1165
Conversation
|
78f489c
to
cfdf4e5
Compare
I'm not sure this is the correct way to split this PR into commits (I'm pretty sure it isn't, as the commits won't compile), however I can't think of a proper way. |
6aee097
to
440d63a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the first commit (introduction of TypeParser
)
Some general comments:
- The logic of
TypeParser
is quite complex. I suggest adding some docstrings next to the type definitions and methods. For example, I have no idea whatTypeParser::from_hex
does. Docstrings will also help a lot in the future in case some other developer touches this piece of code. - It's worth adding some comments next to the non-intuitive parts of the code. Example:
if name.is_empty() {
if !self.is_eos() {
return Err(CqlTypeParseError::AbstractTypeParseError());
}
return Ok(ColumnType::Blob);
}
It's not obvious why we return Blob
if name is empty. A link to the corresponding part of original source code would be helpful.
- Please, add some unit tests. I saw that there is some small test of
TypeParser
in a later commit. I think we should add more tests and try to handle as many parsing cases as we can. In addition, I think that in this case, unit tests should be added in the same commit (they help during review - it's easier to reason about the complex code when there are some use case examples one can look at) - This implementation is based on some existing (probably Java) implementation, correct? If so, please, provide the link to the source in the commit. Ideally, the link should be placed in the comments in code as well.
type UDTParameters<'result> = ( | ||
Cow<'result, str>, | ||
Cow<'result, str>, | ||
Vec<(Cow<'result, str>, ColumnType<'result>)>, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd prefer to have it as a struct
instead of a type alias. Cow<'result, str>
type appears twice and it's hard to reason about it without explicit field names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd prefer to have it as a
struct
instead of a newtype.
Actually, a struct
is called a newtype. type
is a type alias, which is not a new type, yet just a new name for an existing type.
pub(crate) struct TypeParser<'result> { | ||
pos: usize, | ||
str: Cow<'result, str>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Probably need to rename it to CustomTypeParser
(or AbstractTypeParser
if we decide to stick to abstract naming convention). Same goes for the name of the module - type_parser.rs
is not specific enough IMO.
if !self.is_eos() && self.str.as_bytes()[self.pos] == b':' { | ||
self.pos += 1; | ||
let _ = usize::from_str_radix(&name, 16) | ||
.map_err(|_| CqlTypeParseError::AbstractTypeParseError()); | ||
name = self.read_next_identifier(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this part do? Is it tested somewhere?
Whole TypeParser logic was ripped straight out of ScyllaDB's vector implementation, however, as it still in development and probably won't be merged for a while, it will be hard to link directly. IIRC there is a lot of tests there for this functionality, so thay also can be borrowed. |
Ok, makes sense. And let's borrow the tests in such case :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 What a great piece of code! Thank you for the contribution!
There are quite many comments, though.
I think that the new parser module needs much more unit tests.
Also, tests for particular errors upon serialization and deserialization of Vector are missing.
type UDTParameters<'result> = ( | ||
Cow<'result, str>, | ||
Cow<'result, str>, | ||
Vec<(Cow<'result, str>, ColumnType<'result>)>, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd prefer to have it as a
struct
instead of a newtype.
Actually, a struct
is called a newtype. type
is a type alias, which is not a new type, yet just a new name for an existing type.
pub struct CellWriter<'buf> { | ||
buf: &'buf mut Vec<u8>, | ||
cell_len: Option<usize>, | ||
size_as_uvarint: bool, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lorak-mmk As you know the serialization framework quite well, could you please aid in review of this commit?
if let Some(buffer) = self.variable_length_buffer { | ||
let value_len = buffer.len(); | ||
let mut len = Vec::new(); | ||
types::unsigned_vint_encode(value_len as u64, &mut len); | ||
self.buf.extend_from_slice(&len); | ||
self.buf.extend_from_slice(&buffer); | ||
} else { | ||
let value_len: i32 = (self.buf.len() - self.starting_pos - 4) | ||
.try_into() | ||
.map_err(|_| CellOverflowError)?; | ||
self.buf[self.starting_pos..self.starting_pos + 4] | ||
.copy_from_slice(&value_len.to_be_bytes()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 The logic around Cell*
got much convoluted with these alterations. Let's think how we can make it more digestible. @Lorak-mmk @muzarski
23d6dad
to
8e128e1
Compare
b572171
to
62f008a
Compare
As of right now all changes requested above should be implemented, the only thing left is the lifetime issue with the new_type_parser in deser_type_generic. @wprzytula @Lorak-mmk |
I see a lot of open review comments. Please go over them and:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left just 2 comments. I'll continue the review after the second one is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit: frame: added a parser for Java-class type names
You introduced both new_type_parser.rs
and type_parser.rs
.
- Both contain
UDTParameters
struct, andTypeParser
struct withpub(crate) parse
method. - There is no module-level comment on those files
- No comments on the aforementioned structs and method.
type_parser
seems unused (Doesmake ci
pass on this commit?).- Commit message says nothing about those modules and structs.
How I as a reviewer am supposed to understand the purpose of those modules and structures? Please improve it by addressing some of the points above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of those do the same thing, parse the Java class type name, however the new_type_parser uses the parse.rs module that was already present in the project, however due to lifetime issues I was not able to make it work correctly. So right now there are two versions of the parser, once we either find a way to fix the lifetime issues or decide to go back to type_parser.rs (that would be quite sad, we would duplicate a lot of parsing code) the other version will be deleted. cc @wprzytula
85ef4e7
to
8165b53
Compare
I have closed the resolved conversations, all of the issues in them should be fixed. I have decided to not touch the bigger issues with the type_parser.rs right now, as if we decide to throw it away it would be a waste of time right now. cc @Lorak-mmk |
This is needed to deserialize vector metadata as it is implemented as a Custom type with VectorType as its class
Due to the fact that Cassandra implements variable type length vectors in a way that contradicts the CQL protocol, special care must be given when deserializing them as sizes of their elements are encoded as unsigned vint instead of an int
Similarly to the previous commit, special care must be given when serializing variable type length vectors, as sizes of their elements must be written as an unsigned varint
If we decide not to fix it right now, could you create the issue with the whole connected covered so that we don't forget about it in the future? |
This PR adds serialization and deserialization of CQL Vector (as implemented in Cassandra) therefore achieving compatibility with Cassandra's Vector type. It's important to note that Cassandra implements Vector serialization and deserialization in a way that
contradicts the CQL protocol, using [unsigned vint] instead of [int] as the element size encoding for variable type length vectors.
Fixes #1014
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.