Description
Current infra for recordings
flowchart TB
clickhouse_replay_summary_table[(Session replay summary table)]
received_snapshot_event[clients send snapshot data]
kafka_recordings_topic>Kafka recordings topic]
kafka_clickhouse_replay_summary_topic>Kafka clickhouse replay summary topic]
subgraph recordings ingestion workload
recordings_consumer-->kafka_clickhouse_replay_summary_topic
kafka_clickhouse_replay_summary_topic-->clickhouse_replay_summary_table
end
subgraph recordings blob ingestion workload
recordings_blob_consumer-->|1. statefully batch|disk
disk-->recordings_blob_consumer
recordings_blob_consumer-->|2. periodic flush|blob_storage
end
received_snapshot_event --> capture
capture -->|compress and chunk to fit into kafka messages| kafka_recordings_topic
kafka_recordings_topic --> recordings_consumer
kafka_recordings_topic --> recordings_blob_consumer
Let's assume the first change is to teach the consumers to listen to more than one topic so we can cut over without needing to orchestrate multiple releases
I'm also going to assume that we can't avoid some messages being written to both topics during a cutover
We have
- recordings consumer
- blob ingestion consumer
recordings consumer
- writes individual events to session_recording_events table
- they are relatively immune to duplication
- because CH will de-duplicate based on team_id, toHour(timestamp), session_id, timestamp, uuid
writes also summarised data to session_replay_events table, so that we can eventually stop writing to session_recording_events table
- less able to handle duplication because of that aggregation
- CH will ignore writes it thinks are duplicates (we can play with https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#replicated-deduplication-window) to tune this if needed
- we can tolerate some duplicates since it means that counts will be off but if we entirely miss a session then the user loses data
Neither consumer cares particularly about anything kafka specific e.g. offset value
blob ingestion consumer
- reads messages from kafka
- does not auto-commit
- buffers them on disk (and a little in memory)
- periodically flushes to S3
- and on S3 flush might commit an offset to kafka
- uses the fact that offset will always increment to track progress
Here's the one that will "break", because I guess the kafka offset will reset to 0
"Break" here means that we would write to S3 even if we had already written to S3 for any session that has messages on both topics.
If we wanted to be super resilient we'd make a change to track sessions by topic and not accept messages on the new topic if they were for sessions on the old topic. Let that "drain" and then tidy up the code once all active sessions are on the new topic.
A quick google suggests that you can set partition offsets but that feels more confusing 😅
TBH, right now we're double processing anyway as we work through all the differences between the theory and practice of the new consumer. So, if we can cut over relatively quickly then we're not adding much to the risk.
Without wanting to speak for @benjackwhite too much, I don't think we know kafka well enough to do this all by ourselves 😅
So, I'm thinking
1,
set up separate capture deployment that only handles session recording traffic and writes to a different MSK cluster?
2, and 3 at the same time
teach the two consumers to use multiple kafka clusters/topics. they read from both at the same time
4
test that in dev
5
roll that to EU
6
roll that to prod