Skip to content

Conversation

@lukasz-antoniak
Copy link
Member

@lukasz-antoniak lukasz-antoniak commented Oct 10, 2024

Fix #1734.

Complete vector type support for GoCQL. Supports fixed and variable-length vector types (e.g. vector of floats, vector of list of text).

@tengu-alt
Copy link
Contributor

Have you tested it with protoV5? If not I would be glad to handle it with #1822.

@joao-r-reis
Copy link
Contributor

Have you tested it with protoV5?

Hmm protocol v5 shouldn't matter much here, the encoding of the type itself is the same regardless of whether v4 or v5 is being used.

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking solid, I left a few comments. Also note that you left a few TODOs.

@joao-r-reis joao-r-reis changed the title Support vector type CASSGO-11 Support vector type Oct 24, 2024
Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it 👍 added a couple of small comments

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff 👍

marshal.go Outdated
}
if k == reflect.Array {
if rv.Len() != info.Dimensions {
return unmarshalErrorf("unmarshal vector: array with wrong size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe add the expected and provided sizes to the error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

vector_test.go Outdated
{name: "smallint", cqlType: TypeSmallInt.String(), value: []int16{127, 256, -1234}},
{name: "tinyint", cqlType: TypeTinyInt.String(), value: []int8{127, 9, -123}},
{name: "duration", cqlType: TypeDuration.String(), value: []Duration{duration1, duration2, duration3}},
// TODO(lantonia): Test vector of custom types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: a couple of TODOs in this test still

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

sleep 30s
- name: Integration tests
run: |
source ~/venv/bin/activate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a reminder to remove the workflow changes from this PR before wrapping up (we'll have to potentially merge #1833 first) when we get a second +1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this PR will wait for #1833. I will remove workflow changes form this PR, rebase and squash commits.

@jameshartig
Copy link
Contributor

I'd be curious your thoughts on if implementing this after #1855 would've made it easier.

@joao-r-reis
Copy link
Contributor

I'd be curious your thoughts on if implementing this after #1855 would've made it easier.

@lukasz-antoniak can you explore this? I imagine that this implementation doesn't have much overlap with #1855 and I'd like to get it in before #1855 if this is the case but I'd like your thoughts on this @lukasz-antoniak

If we do want to get this done before #1855 then we need a rebase on this PR and then I can review it again

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think we need to move the C* versions to 4.1.x and 5.0.x in CI so that we get coverage for these tests before we approve the PR to be merged

},
},
{
"vector<vector<float, 3>, 5>", VectorType{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a couple of cases with more complex types like vector<map<uuid,timestamp>, 5> for example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two more - one with map, and other with frozen tuple.

Copy link
Contributor

@worryg0d worryg0d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me, but we still need a couple of folks to review this one just to ensure we didn't miss something.

helpers.go Outdated
Elements: fields,
}
} else if strings.HasPrefix(name, VECTOR_TYPE) {
names := splitJavaCompositeTypes(strings.TrimPrefix(name[:len(name)-1], VECTOR_TYPE+"("))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the splitCQLCompositeTypes and splitJavaCompositeTypes, the prefix trimming logic is repeated in multiple cases. It could be moved inside those funcs, which would make the code cleaner and reduce repetition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


func getCassandraType(name string, logger StdLogger) TypeInfo {
// Parses long Java-style type definition to internal data structures.
func getCassandraLongType(name string, protoVer byte, logger StdLogger) TypeInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a TODO to cover getCassandraLongType with tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now added TODO. This is tested indirectly in e.g. TestFunctionMetadata.

Comment on lines 175 to 178
case TypeCustom:
switch info.(type) {
case VectorType:
return marshalVector(info.(VectorType), value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simplify this block of code by assigning the type to var so you eliminate the need for the second type assertion:

Suggested change
case TypeCustom:
switch info.(type) {
case VectorType:
return marshalVector(info.(VectorType), value)
}
case TypeCustom:
switch t := info.(type) {
case VectorType:
return marshalVector(t, value)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the switch statement appears to only handle the VectorType case without any default case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested switch looks redundant.
Maybe something like this would be more suitable?

case TypeCustom:
	if vector, ok := info.(VectorType); ok {
		return marshalVector(vector, value)
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 284 to 286
case TypeCustom:
switch info.(type) {
case VectorType:
return unmarshalVector(info.(VectorType), data, value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

marshal.go Outdated
}
return buf.Bytes(), nil
}
return nil, marshalErrorf("can not marshal %T into %s", value, info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to mention which types are actually marshalable (e.g., array, slice)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is in-line with others. Do we want to make an exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a messadges with the provided accepted types like in here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

marshal.go Outdated
}
return nil
}
return unmarshalErrorf("can not unmarshal %s into %T", info, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines +1756 to +1755
if isVectorVariableLengthType(info.SubType) {
writeUnsignedVInt(buf, uint64(len(item)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: These functions could use some documentation imo—particularly explaining how it works, the intended purpose, and what happens if info.SubType is not isVectorVariableLengthType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dealing with fix-length types, serializing length of the element is not required. isVectorVariableLengthType returns true for all types that require serialization of their length. Not sure what comment should be appropriate, which does not just read the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation — that clarifies things. I was curios how data considered "Vector Variable Length" in this case. So if you like it, I suggest adding comment as following:

// isVectorVariableLengthType determines if a type requires explicit length serialization within a vector. 
Variable-length types need their length encoded before the actual data to allow proper deserialization.
Fixed-length types, on the other hand, don't require this kind of length prefix.

Also, if there are no serialization requirements for fixed-length types, should TypeSmallInt and TypeTinyInt be removed from isVectorVariableLengthType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your comment. Unfortunately for those types server expects the length also.

Comment on lines 1826 to 1834
switch elemType.Type() {
case TypeVarchar, TypeAscii, TypeBlob, TypeText:
return true
case TypeCounter:
return true
case TypeDuration, TypeDate, TypeTime:
return true
case TypeDecimal, TypeSmallInt, TypeTinyInt, TypeVarint:
return true
case TypeInet:
return true
case TypeList, TypeSet, TypeMap, TypeUDT, TypeTuple:
return true
Copy link
Contributor

@ribaraka ribaraka Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't combine it in one return?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that instead all of this cases could be single "if", or one case for all suitable types?
As I can see, it's structured by similar types for better readability, not sure that it should be merged into one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability is the reason.

Copy link
Contributor

@ribaraka ribaraka Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just showing what you said no to 🤣

switch elemType.Type() {
	case TypeVarchar, TypeAscii, TypeBlob, TypeText,
		TypeCounter,
		TypeDuration, TypeDate, TypeTime,
		TypeDecimal, TypeSmallInt, TypeTinyInt, TypeVarint,
		TypeInet,
		TypeList, TypeSet, TypeMap, TypeUDT, TypeTuple:
	return true

Copy link
Member Author

@lukasz-antoniak lukasz-antoniak Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a discussion about { put always on new line in C* code base. Changed as you wish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't insisting, just sharing a visual.

Comment on lines 175 to 178
case TypeCustom:
switch info.(type) {
case VectorType:
return marshalVector(info.(VectorType), value)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested switch looks redundant.
Maybe something like this would be more suitable?

case TypeCustom:
	if vector, ok := info.(VectorType); ok {
		return marshalVector(vector, value)
	}

Comment on lines 1826 to 1834
switch elemType.Type() {
case TypeVarchar, TypeAscii, TypeBlob, TypeText:
return true
case TypeCounter:
return true
case TypeDuration, TypeDate, TypeTime:
return true
case TypeDecimal, TypeSmallInt, TypeTinyInt, TypeVarint:
return true
case TypeInet:
return true
case TypeList, TypeSet, TypeMap, TypeUDT, TypeTuple:
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that instead all of this cases could be single "if", or one case for all suitable types?
As I can see, it's structured by similar types for better readability, not sure that it should be merged into one.

marshal.go Outdated
buf.Write(tmp)
}

func readUnsignedVint(data []byte, start int) (uint64, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readUnsignedVint -> readUnsignedVInt, it would be a bit more consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already encVint so maybe let us leave it as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be inconsistent with writeUnsignedVInt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

marshal.go Outdated
buf.Write(tmp)
}

func readUnsignedVint(data []byte, start int) (uint64, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe it should be provided with unit tests, maybe even with fuzzing, to make sure that everything works properly.
Also, I think that it should be along with some documentation, to help developers dive into it faster.
Wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is in native protocol, so we do not need to duplicate it: https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L392C23-L399C45. Method signature already indicates what is the purpose. I can add a test.

@tengu-alt
Copy link
Contributor

tengu-alt commented Apr 9, 2025

Overall, looks good to me, except this and this.

marshal.go Outdated
for i := 0; i < info.Dimensions; i++ {
offset := 0
if isVectorVariableLengthType(info.SubType) {
m, p, err := readUnsignedVint(data, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start parameter of the readUnsignedVint func seem to be redundunt. Could move inside the func. Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is but I would leave it as is when data will not represent just unsigned Vint, but longer slice. But if that is a problem, I can change it.

Copy link
Contributor

@ribaraka ribaraka Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding is kind of distracting. We can always add the parameter back when there's an actual requirement for it. I'd probably remove it if there is no use of it.
@joao-r-reis, I’d appreciate your thoughts on this when you have a moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding is kind of distracting. We can always add the parameter back when there's an actual requirement for it. I'd probably remove it if there is no use of it. @joao-r-reis, I’d appreciate your thoughts on this when you have a moment.

Agree

Comment on lines +1756 to +1755
if isVectorVariableLengthType(info.SubType) {
writeUnsignedVInt(buf, uint64(len(item)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation — that clarifies things. I was curios how data considered "Vector Variable Length" in this case. So if you like it, I suggest adding comment as following:

// isVectorVariableLengthType determines if a type requires explicit length serialization within a vector. 
Variable-length types need their length encoded before the actual data to allow proper deserialization.
Fixed-length types, on the other hand, don't require this kind of length prefix.

Also, if there are no serialization requirements for fixed-length types, should TypeSmallInt and TypeTinyInt be removed from isVectorVariableLengthType?

Comment on lines 1837 to 1839
case TypeCustom:
switch elemType.(type) {
case VectorType:
vecType := elemType.(VectorType)
return isVectorVariableLengthType(vecType.SubType)
}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we agreed on similar switch statement, let's then keep it consistent 🙏

ctx: #1828 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed to if statement like in other cases.

Copy link
Contributor

@ribaraka ribaraka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing job!

@lukasz-antoniak
Copy link
Member Author

@joao-r-reis: Changed C* versions to run integration tests from 4.0 and 4.1, to 4.1 and 5.0.

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff

@joao-r-reis
Copy link
Contributor

I'll wait for approvals by @tengu-alt and @OleksiienkoMykyta before merging this

@joao-r-reis
Copy link
Contributor

After they approve please squash the commits and use a commit description according to our guidelines

@joao-r-reis
Copy link
Contributor

@tengu-alt @OleksiienkoMykyta do you have time to review/approve this PR this week?

@OleksiienkoMykyta
Copy link
Contributor

@tengu-alt @OleksiienkoMykyta do you have time to review/approve this PR this week?

Sure, I'll check it this week

Copy link
Contributor

@tengu-alt tengu-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, looks good to me!

Copy link
Contributor

@OleksiienkoMykyta OleksiienkoMykyta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the great contribution

@joao-r-reis
Copy link
Contributor

@lukasz-antoniak can you squash and edit commit msg?

@lukasz-antoniak
Copy link
Member Author

Done.

@joao-r-reis
Copy link
Contributor

missing entry on CHANGELOG @lukasz-antoniak

Support marshalling and unmarshalling of vector custom type.

patched by Lukasz Antoniak; reviewed by João Reis, Stanislav Bychkov, Oleksandr Luzhniy, Mykyta Oleksiienko and Bohdan Siryk for CASSGO-11
@joao-r-reis joao-r-reis merged commit d32a392 into apache:trunk Apr 24, 2025
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CASSGO-11 Feature Request: Support Vector Type

7 participants