Description
After speaking with @mkoskinen, he made a valid point in that there may be a usecase where some kind of object storage (or more specifically an implementation of BackupClientInterface
) is so basic/simple that it doesn't support functionality such as resume. A way around this would be to use a BackupClientInterface
that supports resuming (even a flat file storage) and then after each substream for that object/key/file is complete you can them upload it to S3/GCS/whatever as a whole.
This can also solve other problems, for example currently we don't compress the .json
file (using something like gz
) while streaming because we can't find a resume point from a compressed object/key/file. This approach would allow you to backup the stream to an initial storage and then after its finished compress it and send it to S3/GCS/whatever.
On first impressions the implementation can be adding a single method to the BackupClientInterface
that returns an Option[Sink]
where if its defined, this sink gets executed after a backup is complete. One considering is whether the sink can be run asynchronously or synchronously (ideally as a parameter in the method itself), i.e.
def afterBackupSink(async: Boolean): Option[Sink]