Skip to content

Is possible to expose more Clickhouse kafka connector metrics like p99 latency of ingesting to Clickhouse Cloud, # of errors/retriable errors, etc. #441

Open
@georgeli-roblox

Description

@georgeli-roblox

Is your feature request related to a problem? Please describe.

@Paultagoras Thanks for adding the Clickhouse connector metrics for #209

# HELP clickhouse_kafka_connect_total ClickHouseKafkaConnector metric ReceivedRecords
# TYPE clickhouse_kafka_connect_total counter
clickhouse_kafka_connect{attribute="ReceivedRecords",sinktask="33",} 0.0
clickhouse_kafka_connect{attribute="RecordProcessingTime",sinktask="33",} 0.0
clickhouse_kafka_connect{attribute="TaskProcessingTime",sinktask="33",} 0.0
clickhouse_kafka_connect{attribute="ReceivedRecords",sinktask="23",} 362.0
clickhouse_kafka_connect{attribute="RecordProcessingTime",sinktask="23",} 8774846.0
clickhouse_kafka_connect{attribute="TaskProcessingTime",sinktask="23",} 7.5087340575E10
...

Also see below graph.

Some questions:

  1. Is there any more detailed documentation on what exactly TaskProcessingTime/RecordProcessingTime/ReceivedRecords are , though they seemed to be intuitive? or we can take a look at the code.
  2. These metrics has the tag of sinktask, Is it possible to link these sinktask#s with the connector name? since one Kafka Connect can host multiple Clickhouse connectors for different tables.
    I
  3. am trying to find a way to easily alert/identify/help troubleshoot the issue whether it's the Clickhouse Cloud side or the Roblox internal kafka side. Do you think maybe adding additional metrics could help? e.g. the # of errors/retries, the Clickhouse ingest latency histogram, p99, p95, p50....

Thanks,
George

Describe the solution you'd like
A clear and concise description of what you want to happen.

  1. more detailed documentation on the TaskProcessingTime/RecordProcessingTime/ReceivedRecords metrics
  2. more tags on the metric besides the sinktask number, e.g. tags mapped the name of the CH connector of a topic.
  3. new metrics that show the health of the ingestion to Clickhouse Cloud.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Screenshot 2024-09-16 at 9 23 08 PM

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions