Skip to content

Improvement - schema reader replay logs#1306

Draft
muralibasani wants to merge 1 commit into
mainfrom
mbasani-schema-reader-consume
Draft

Improvement - schema reader replay logs#1306
muralibasani wants to merge 1 commit into
mainfrom
mbasani-schema-reader-consume

Conversation

@muralibasani

@muralibasani muralibasani commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

About this change - What it does

  • Bug fix: set_not_ready() now resets the replay-timing state (start_time, last_check, _replay_start_logged). Without this, after a master rebalance the next "Schema replay completed in X seconds" log reported time since the original process start (potentially hours), and "Starting schema replay" never re-fired. Resets happen inside _ready_lock for symmetry with the existing _ready write.

  • Refactor: replaced the dual-purpose startup_previous_processed_offset (sentinel + rate-calc state) with a dedicated _replay_start_logged bool. Same intent, clearer code.

  • Log accuracy: reworded "Starting schema replay: N messages to process" → "reading up to offset N", and "Schema replay completed in X seconds (processed N messages)" → "read up to offset N". On compacted topics (which _schemas always is), offsets are not record counts — the old wording mis-suggested 6.9M records were processed when only ~100 lived in the topic after compaction.

  • Removed misleading metric: dropped recs/s from the DEBUG "Replay progress" line. The calculation (offset - prev_offset) / dt measures offset-position delta per second, which on compacted topics correlates with neither records nor bytes. This produced absurd numbers like "26M recs/s" because a single consumed record could jump the position by millions of offsets. The progress percentage already conveys the only meaningful signal.

  • Also removes a latent divide-by-zero in the recs/s calculation that could have triggered if _is_ready ran twice within the same monotonic-clock tick (possible right after set_not_ready).

References: #xxxxx

Why this way

@muralibasani muralibasani requested a review from a team as a code owner June 8, 2026 08:40
@muralibasani muralibasani requested a review from juha-aiven June 8, 2026 08:43
@muralibasani muralibasani marked this pull request as draft June 8, 2026 09:08
@muralibasani muralibasani removed the request for review from juha-aiven June 8, 2026 09:11
@muralibasani muralibasani changed the title Improvement - change consume schemas topic from consume subscribe() to assign() Improvement - consume subscribe Jun 10, 2026
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/karapace/core
  schema_reader.py
Project Total  

This report was generated by python-coverage-comment-action

@muralibasani muralibasani changed the title Improvement - consume subscribe Improvement - replay logs in schema reader Jun 10, 2026
@muralibasani muralibasani force-pushed the mbasani-schema-reader-consume branch from badeca7 to 2fdc0ed Compare June 10, 2026 12:41
@muralibasani muralibasani changed the title Improvement - replay logs in schema reader Improvement - schema reader logs and consume logic Jun 10, 2026
@muralibasani muralibasani changed the title Improvement - schema reader logs and consume logic Improvement - schema reader logs Jun 10, 2026
@muralibasani muralibasani changed the title Improvement - schema reader logs Improvement - schema reader replay logs Jun 10, 2026
@muralibasani muralibasani force-pushed the mbasani-schema-reader-consume branch from fb7904c to cefd21c Compare June 10, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant