-
Notifications
You must be signed in to change notification settings - Fork 59
stream ACS snapshot dumps to S3 #3570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
|
ping @ray-roestenburg-da @rautenrieth-da can I have a review pls? |
Ah, sorry, misremembered when I opened this, thought it was much earlier..... |
| s3BucketConnection, | ||
| loggerFactory, | ||
| ).getSource | ||
| .runWith(TestSink.probe[WithKillSwitch[(Long, CantonTimestamp)]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
| } | ||
|
|
||
| def dumpAcsSnapshot(migrationId: Long, timestamp: CantonTimestamp): Future[Unit] = { | ||
| /** * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** * | |
| /** |
| ) | ||
| val base = | ||
| getAcsSnapshotTimestampAfter(startMigrationId, startAfterTimestamp) | ||
| .via(SingleAcsSnapshotBulkStorage(config, acsSnapshotStore, s3Connection, loggerFactory)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
| ): Flow[(Long, CantonTimestamp), (Long, CantonTimestamp), NotUsed] = | ||
| Flow[(Long, CantonTimestamp)].flatMapConcat { | ||
| case (migrationId: Long, timestamp: CantonTimestamp) => | ||
| new SingleAcsSnapshotBulkStorage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nitpick, you could do SingleAcsSnapshotBulkStorage.getSource and just pass the args to that method, you don't really need to create a new instance of SingleAcsSnapshotBulkStorage and then call it's one method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it has other private methods that access these arguments, so moving them to be arguments of getSource would require passing these around to other private methods as well. Not sure what you gain here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think that's more straightforward, and slightly more functional style, just methods calling methods and if needed passing along arguments. From perspective of the user of the function, the user just calls the function with the args, the user does not need to create an instance just to call one method, so from usage perspective it's simpler, just call the function on object SingleAcsSnapshotBulkStorage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... still not sure I'm convinced :)
This is hidden in the object's apply, so the user is unaware of the fact that an object is created behind the scenes. The user indeed just calls a method on the object, so that's the UX you propose. Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I'm proposing Object.method, no apply, just a method on an object, it's very easy to find. It's not super important 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, no strong feelings here. The only difference is that a user needs to type also .method after Object when using this, but if that seems more intuitive for you, I'm fine with that.
| acsSnapshotStore, | ||
| s3Connection, | ||
| loggerFactory, | ||
| ).getSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get the source (getSource) which return Source[String, ..], then you fold over it and just emit migration timestamp, this is not complete right? (maybe make a note of that) or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed the TODOs in getSource. I think it would make sense for the write to s3 to be a flow that emits (migrationid, timestamp) that was written, but you're probably planning this for later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to move the two lines below into the source itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess that makes sense, will do
| val s3BucketConnection = getS3BucketConnectionWithInjectedErrors(loggerFactory) | ||
| for { | ||
| _ <- SingleAcsSnapshotBulkStorage | ||
| .asSource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this test, maybe it's nice to separate the code into a source that gives you acs snapshots (from the store) and a flow to write the s3 stuff, and then connect them in the code. (you could also unit test just the flow for instance without having an actual source that gives you real acs snapshots from the db)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As things are right now, wrapping the write to s3 as a flow is 2 lines of code, so I don't see great value in adding a class for that. Having said that, there's this TODO where I do plan on doing basically that:
Lines 94 to 97 in b1eb83a
| // TODO(#3429): For now, we accumulate the full object in memory, then write it as a whole. | |
| // Consider streaming it to S3 instead. Need to make sure that it then handles crashes correctly, | |
| // i.e. that until we tell S3 that we're done writing, if we stop, then S3 throws away the | |
| // partially written object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a couple of methods on an object (don't need a class) to create the separate sources and flows and then use those somewhere compose them source.via(flow). Not a huge deal.
I'm used to doing something like:
object Bulk {
def acsFromScan(...): Source[.., ..]
def acsToS3(...): Flow[..,..,..]
}
And somewhere you compose the whole stream acsFromScan().via(acsToS3)...
then you can test both separately, and use where you like. just FYI. in a test you can do Source(Vector(acs1,...)).via(acsToS3) .. etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, will do as part of that TODO
ray-roestenburg-da
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! definitely going in the right direction, made some comments
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
…am-2 Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Signed-off-by: Itai Segall <itai.segall@digitalasset.com>
Currently dumps all snapshots. Computing which snapshots to actually dump, and which to skip, is tracked as a followup step in the tracking issue.
Fixes #3617
Pull Request Checklist
Cluster Testing
/cluster_teston this PR to request it, and ping someone with access to the DA-internal system to approve it./hdm_teston this PR to request it, and ping someone with access to the DA-internal system to approve it.PR Guidelines
Fixes #n, and mention issues worked on using#nMerge Guidelines