Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE]: Implement values compression for text and blob types #218

Open
vladzcloudius opened this issue Jul 19, 2024 · 19 comments · May be fixed by #221
Open

[RFE]: Implement values compression for text and blob types #218

vladzcloudius opened this issue Jul 19, 2024 · 19 comments · May be fixed by #221
Assignees

Comments

@vladzcloudius
Copy link

Description

Implement an optional compression of text/ascii/blob cells: this way the DB will have to handle smaller buffers internally.

The compression should only be done for values with sizes above a configurable threshold.

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 22, 2024

Column-based blobs compression

General idea

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server.
Even with network compression turned on (ClusterConfig.Compressor) and server side data compression server still decompress frame and extracts data and compress it back before storing it to disk, which creates additional CPU and memory load, which will be completely avoided in this feature.

Interoperability issues

  1. Data will be readable only by properly configured gocql driver, cqlsh or other drivers won't be able to read it

Possible implementation of serialization/deserialization

1. Global variable + hack into marshalVarchar/unmarshalVarchar

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

var MyGlobalCompressor blobCompressor

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

  1. Easy to use
  2. Easy to implement

Cons:

  1. As dirty as it gets
  2. No control over cluster/session/column, compression is either globaly on or globaly off

2. Option in ClusterConfig + hack into NativeType

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

type ClusterConfig struct {
	...
	BlobCompressor blobCompressor
}

type NativeType struct {
	...
	blobCompressor blobCompressor
}

func getCompressor(info TypeInfo) blobCompressor {
        nt, ok := info.(NativeType)
        if !ok {
                return nil
        }
        return nt.blobCompressor
}

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

  1. Easy to use
  2. Less dirty than Global variable

Cons:

  1. No control over column, compression is either on or off for all the columns in the given session.
  2. Polutes NativeType

3. Custom type

type CompressedType struct{
    compressor *Compressor
    val []byte
}

func (ct *CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return ct.compressor.Compress(ct.val)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   ct.val, err = ct.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob().Set([]byte("my value")),
		).Exec()

Pros:

  1. Maximum control over which column is compressed
  2. No driver polution, could be implemented as part of lz4 package
  3. Not dirty at all

Cons;

  1. Usage a bit cumbersom

3. Custom type with global compressor

type CompressedType []byte

func (ct CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return globalcompressor.Compress(ct)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   *val, err = globalcompressor.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob([]byte("my value")),
		).Exec()

Pros:

  1. Maximum control over which column is compressed
  2. No driver pollution, could be implemented as part of lz4 package
  3. Not dirty at all

@dkropachev
Copy link
Collaborator

@mykaul , @Lorak-mmk , let's move discussion here

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 22, 2024

@mykaul, this is continueation of comment

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server.

  1. Network - you have client<->server network compression.
    Server - the default is already compressing with LZ4.

Both btw use LZ4 (or snappy) by default - which is less suitable for JSON/TEXT blobs (as it has no entropy compression). I think we should use zstd for it. (Another advantage is that I hope one day we'll do ZSTD client<->server as well...)

I saw that network compression is done on whole frame, which makes me think that when it both feature are working, server decompress frame, extracts column data and then compress it back when writing to the sstable. Am I correct on that ?

  1. You definitely need control for it. You need to have a minimum length where it is even reasonable to do it and a percentage where it makes sense to keep it uncompressed.
  2. If we do implement it, we'd want it for multiple drivers, not just gocql.
  3. It's unclear to me how we determine which BLOBS to compress and which not to. JPEG blobs are not an ideal candidate, for example.

Best case if we leave user to decide (Custom type options), alternatively to collect some stats on compression rate and then conclude from it, until stats is available data is compressed if compression rate is not good, write uncompressed.

@mykaul
Copy link

mykaul commented Jul 22, 2024

@mykaul, this is continueation of comment

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server.

  1. Network - you have client<->server network compression.
    Server - the default is already compressing with LZ4.

Both btw use LZ4 (or snappy) by default - which is less suitable for JSON/TEXT blobs (as it has no entropy compression). I think we should use zstd for it. (Another advantage is that I hope one day we'll do ZSTD client<->server as well...)

I saw that network compression is done on whole frame, which makes me think that when it both feature are working, server decompress frame, extracts column data and then compress it back when writing to the sstable. Am I correct on that ?

Yes

  1. You definitely need control for it. You need to have a minimum length where it is even reasonable to do it and a percentage where it makes sense to keep it uncompressed.
  2. If we do implement it, we'd want it for multiple drivers, not just gocql.
  3. It's unclear to me how we determine which BLOBS to compress and which not to. JPEG blobs are not an ideal candidate, for example.

Best case if we leave user to decide (Custom type options), alternatively to collect some stats on compression rate and then conclude from it, until stats is available data is compressed if compression rate is not good, write uncompressed.

But it can change from workload to workload. What we should do is be able to, at the end of the compression of a specific BLOB, to determine if it passed some threshold or not. If you compressed 1000 bytes to 950 bytes, it's worthless. Don't spend the cycles.
See https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#confval-bluestore_compression_required_ratio (and other parameters) where I'm taking the idea from.

@Lorak-mmk
Copy link

Column-based blobs compression

General idea

If client stores big blobs of data compressing data that goes into that field will reduce operations select/update footprint on both network and server. Even with network compression turned on (ClusterConfig.Compressor) and server side data compression server still decompress frame and extracts data and compress it back before storing it to disk, which creates additional CPU and memory load, which will be completely avoided in this feature.

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

Interoperability issues

1. Data will be readable only by properly configured `gocql` driver, `cqlsh` or other drivers won't be able to read it

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.
This assumption may at some point stop being true, and then it is difficult to change prefix. It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

Those are quite a big drawbacks, that's why I asked the previous question, because the issue does not explain how the problem is great enough that we are willing to tolerate such a drawback.

Possible implementation of serialization/deserialization

1. Global variable + hack into marshalVarchar/unmarshalVarchar

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

var MyGlobalCompressor blobCompressor

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if MyGlobalCompressor != nil {
		data, err = BlobCompressor.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

1. Easy to use

2. Easy to implement

Cons:

1. As dirty as it gets

2. No control over cluster/session/column, compression is either globaly on or globaly off

2. Option in ClusterConfig + hack into NativeType

# marshal.go

type blobCompressor interface {
	Compress([]byte) ([]byte, error)
	Decompress([]byte) ([]byte, error)
}

type ClusterConfig struct {
	...
	BlobCompressor blobCompressor
}

type NativeType struct {
	...
	blobCompressor blobCompressor
}

func getCompressor(info TypeInfo) blobCompressor {
        nt, ok := info.(NativeType)
        if !ok {
                return nil
        }
        return nt.blobCompressor
}

func unmarshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return unmarshalVarcharRaw(info, data, value)
}

func marshalVarchar(info TypeInfo, data []byte, value interface{}) (err error) {
	if c := getCompressor(info); c != nil {
		data, err = c.Decompress(data)
		if err != nil {
			return err
		}
	}
	return marshalVarcharRaw(info, data, value)
}

Pros:

1. Easy to use

2. Less dirty than `Global variable`

Cons:

1. No control over column, compression is either on or off for all the columns in the given session.

2. Polutes `NativeType`

3. Custom type

type CompressedType struct{
    compressor *Compressor
    val []byte
}

func (ct *CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return ct.compressor.Compress(ct.val)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   ct.val, err = ct.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob().Set([]byte("my value")),
		).Exec()

Pros:

1. Maximum control over which column is compressed

2. No driver polution, could be implemented as part of `lz4` package

3. Not dirty at all

Cons;

1. Usage a bit cumbersom

3. Custom type with global compressor

I assume this was meant to be number 4

type CompressedType []byte

func (ct CompressedType) MarshalCQL(info TypeInfo) ([]byte, error) {
    return globalcompressor.Compress(ct)
}

func (ct *CompressedType) UnmarshalCQL(info TypeInfo, data []byte) (err error) {
   *val, err = globalcompressor.Decompress(data)
   return err
}

type Compressor struct{
	...
}

type (c Compressor) Blob() CompressedType

# How to use
...
	err = session.Query(
			`INSERT INTO gocql_test.test_blob_compressor (testuuid, testblob) VALUES (?, ?, ?)`,
			TimeUUID(), compressor.Blob([]byte("my value")),
		).Exec()

Pros:

1. Maximum control over which column is compressed

2. No driver pollution, could be implemented as part of `lz4` package

3. Not dirty at all

If we decide to implement this, I'd be for option 3 - because of no global variables.

@vladzcloudius
Copy link
Author

vladzcloudius commented Jul 30, 2024

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

@karol-kokoszka scylla struggles working with large cells and this is supposed to be a common ScyllaDB knowledge. When it has to deal with these cells it puts a lot of stress on the seastar memory allocator and this feature is supposed to reduce such stress.

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.

I believe that modern compression algorithms are robust enough to avoid a possibility of such a situation.

Even if the header matches the compressed archive the following attempt of decompressing of a blob which is actually not a compressed archive is going to fail. It definitely will for an ASCII input.

And if a user sends binary blobs as data and wants to prevent a potential collision they can simply disable such a compression.

It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

In use cases that allow such situations one should always compress or not compress, which the driver should allow configuring.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)

It turns out that a Scylla and Datastax Java Drivers already allow similar things.

@Lorak-mmk
Copy link

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

@karol-kokoszka scylla struggles working with large cells and this is supposed to be a common ScyllaDB knowledge. When it has to deal with these cells it puts a lot of stress on the seastar memory allocator and this feature is supposed to reduce such stress.

Didn't know about this. I assume it's a tribal knowledge scattered around various issues and there is no place I can learn about this?

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.

I believe that modern compression algorithms are robust enough to avoid a possibility of such a situation.

Even if the header matches the compressed archive the following attempt of decompressing of a blob which is actually not a compressed archive is going to fail. It definitely will for an ASCII input.

And if a user sends binary blobs as data and wants to prevent a potential collision they can simply disable such a compression.

It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

In use cases that allow such situations one should always compress or not compress, which the driver should allow configuring.

I think we are talking about different things. Did you see the PR implementing this feature? #221

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size.
Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short.
But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.

We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)

It turns out that a Scylla and Datastax Java Drivers already allow similar things.

It's not a similar thing, you linked to docs about CQL compression (which I'm pretty sure goCQL already supports).

@dkropachev
Copy link
Collaborator

I think we are talking about different things. Did you see the PR implementing this feature? #221

It is a draft to quickly test feature out and no good as point of reference.

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.

We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

It could be mittigated.
But in general, it is better to support both options.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)
It turns out that a Scylla and Datastax Java Drivers already allow similar things.

It's not a similar thing, you linked to docs about CQL compression (which I'm pretty sure goCQL already supports).

+1

@Lorak-mmk
Copy link

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.
We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

It could be mittigated. But in general, it is better to support both options.

How could it be mitigated?

@dkropachev
Copy link
Collaborator

It uses a prefix (by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.
We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

It could be mittigated. But in general, it is better to support both options.

How could it be mitigated?

There are two cases to address:

  1. Pre-existing data in the table that is not copressed and starts with the prefix
    It could be addressed only by user them selfs, we will document to pick prefix wisely and to make sure that existing data in the table does not start with it.

  2. Column data that is being serialized that was not compressed (had not hit compression or size limit) that starts with prefix
    Having compressed and not-compressed data prefixed solves this issue.

@vladzcloudius
Copy link
Author

Do we have any benchmarks that show that this additional overhead of recompression is significant enough to warrant such a feature?

@karol-kokoszka scylla struggles working with large cells and this is supposed to be a common ScyllaDB knowledge. When it has to deal with these cells it puts a lot of stress on the seastar memory allocator and this feature is supposed to reduce such stress.

Didn't know about this. I assume it's a tribal knowledge scattered around various issues and there is no place I can learn about this?

https://opensource.docs.scylladb.com/stable/troubleshooting/large-rows-large-cells-tables.html

There is also another drawback: the risk of data corruption. The solution relies on prepending some prefix to data, and assumes that this prefix will never occur in real data.

I believe that modern compression algorithms are robust enough to avoid a possibility of such a situation.
Even if the header matches the compressed archive the following attempt of decompressing of a blob which is actually not a compressed archive is going to fail. It definitely will for an ASCII input.
And if a user sends binary blobs as data and wants to prevent a potential collision they can simply disable such a compression.

It will also create the possibility of DoS attack (if the data is user provided, like a message in some IM) if user prepends the prefix to the data.

In use cases that allow such situations one should always compress or not compress, which the driver should allow configuring.

I think we are talking about different things. Did you see the PR implementing this feature? #221

It uses a prefix W(by default lz4:) to tell if the value fetched from DB is compressed or not. It also has a limit, so it only compresses blobs over certain size. Now imagine that Discord or some other messenger enables such a feature , and some user sends a message lz4:blabla. It will not be compressed by the driver, because it's too short. But when selecting it, the driver will think it's compressed and try to decompress it, which will of course fail, potentially causing failures for other users.

You are right. I missed that part.
When we proposed this feature I assumed that you don't need to prepend the compressed chunk with any prefix. Every compression library encodes a corresponding header already. And an attempt to uncompress a not compressed buffer is going to fail fast.

We could avoid using such mechanisms (prefix, length limit) and always compress, but that would severly limit usability of such a feature, because you could only use it with fresh table, not with existing data.

We don't have to - see my comment above. We should be able to both not add any custom prefixes and be able to identify if the blob you received is a compressed one or not.

And look what I found: https://java-driver.docs.scylladb.com/stable/manual/core/compression/ (and the corresponding https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/compression/index.html)
It turns out that a Scylla and Datastax Java Drivers already allow similar things.

It's not a similar thing, you linked to docs about CQL compression (which I'm pretty sure goCQL already supports).

Oh, I assumed that this is the same thing (values compression). My bad.

@Lorak-mmk
Copy link

https://opensource.docs.scylladb.com/stable/troubleshooting/large-rows-large-cells-tables.html

Thanks!

You are right. I missed that part. When we proposed this feature I assumed that you don't need to prepend the compressed chunk with any prefix. Every compression library encodes a corresponding header already. And an attempt to uncompress a not compressed buffer is going to fail fast.

So you propose to deserialize like follows (pseudocode):

try:
   value = decompress(buffer) // try to decompress
except deserialization_error:
   value = buffer // if it failed assume the value is not compressed

and to serialize all blobs, regardless of length?
I think this could possibly work, as long as there are no compressed blobs stored in the table before enabling the feature - and that's a pretty big if if there is user-provided data.

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 31, 2024

https://opensource.docs.scylladb.com/stable/troubleshooting/large-rows-large-cells-tables.html

Thanks!

You are right. I missed that part. When we proposed this feature I assumed that you don't need to prepend the compressed chunk with any prefix. Every compression library encodes a corresponding header already. And an attempt to uncompress a not compressed buffer is going to fail fast.

So you propose to deserialize like follows (pseudocode):

try:
   value = decompress(buffer) // try to decompress
except deserialization_error:
   value = buffer // if it failed assume the value is not compressed

and to serialize all blobs, regardless of length? I think this could possibly work, as long as there are no compressed blobs stored in the table before enabling the feature - and that's a pretty big if if there is user-provided data.

Hasn't it same problem as you described in #218 (comment) ?
If somehow uncompressed data will have prefix that is expected by algorithm, on decompression it will try to decode it, opening a door for a DOS atack.

@Lorak-mmk
Copy link

Right, Not DoS, but incorrect data would be returned, because code would treat this as raw uncompressed data. It's probably worse :/

@dkropachev
Copy link
Collaborator

dkropachev commented Jul 31, 2024

Right, Not DoS, but incorrect data would be returned, because code would treat this as raw uncompressed data. It's probably worse :/

If you forge prefix and header properly and underlying sturctures you can cause excessive memory and cpu consumption which can lead to DoS.

@mykaul
Copy link

mykaul commented Jul 31, 2024

Right, Not DoS, but incorrect data would be returned, because code would treat this as raw uncompressed data. It's probably worse :/

If you forge prefix and header properly and underlying sturctures you can cause excessive memory and cpu consumption which can lead to DoS.

You can limit both.

@Lorak-mmk
Copy link

Lorak-mmk commented Jul 31, 2024

If the feature requires careful considerations from the library user and some amount of supporting code in user code because of aforementioned problems, is there a point of implementing it in the driver instead of in application code? User will need modifications and safe guards around it anyway. Method 3, custom type, is something that can easily be implemented by the user, with very limited amount of code.

@dkropachev
Copy link
Collaborator

If the feature requires careful considerations from the library user and some amount of supporting code in user code because of aforementioned problems, is there a point of implementing it in the driver instead of in application code? User will need modifications and safe guards around it anyway. Method 3, custom type, is something that can easily be implemented by the user, with very limited amount of code.

  1. It allows user to avoid same implementation mistakes we discussed.
  2. To ease feature implementation for a user.
  3. Have it properly tested in our pipelines.

@mykaul
Copy link

mykaul commented Aug 11, 2024

If the feature requires careful considerations from the library user and some amount of supporting code in user code because of aforementioned problems, is there a point of implementing it in the driver instead of in application code? User will need modifications and safe guards around it anyway. Method 3, custom type, is something that can easily be implemented by the user, with very limited amount of code.

  1. It allows user to avoid same implementation mistakes we discussed.

We could add an example compressing a BLOB for all our driver examples.

  1. To ease feature implementation for a user.

See above.

  1. Have it properly tested in our pipelines.

See above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants