Fix TensorBoard callback step counter never updating by pctablet505 · Pull Request #22357 · keras-team/keras

pctablet505 · 2026-03-05T12:37:11Z

Fixes: #20143

Problem

The TensorBoard callback was passing a plain Python int (self._global_train_batch) to writer.as_default(step) and record_if(should_record). Because writer.as_default in TF captures the step value at the moment of the call (not a reference), the step seen by TensorBoard never advanced past 0. Batch-level summaries were either always written (step 0 % N == 0) or the step axis in TensorBoard was frozen at 0, making batch-level curves useless.

Root Cause

_push_writer is called once at on_train_begin with the scalar 0. After that, self._global_train_batch is incremented on every batch, but the summary context still holds the original captured integer — not a live reference.

Fix

Introduce a tf.Variable (_train_step_var / _test_step_var) at on_train_begin / on_test_begin and pass the variable to _push_writer. On every on_train_batch_begin / on_test_batch_begin, assign the current counter into that variable. Because writer.as_default accepts a variable and reads it lazily, TensorBoard now sees the correct step for every batch.

Files Changed

keras/src/callbacks/tensorboard.py — replace scalar step with tf.Variable, assign on each batch begin
This pull request updates the TensorBoard callback in Keras to improve the way step variables are managed during training and testing. The main change is the introduction of TensorFlow tf.Variable objects to track step counts, enabling more robust and flexible summary writing, especially in graph execution mode.

Step variable management improvements:

Introduced tf.Variable objects (_train_step_var and _test_step_var) to track training and testing steps, replacing direct use of integer counters for better TensorFlow compatibility. [1] [2]
Updated _push_writer and related logic to use the new step variables instead of raw integers, ensuring summaries are written correctly in both eager and graph execution modes.

Synchronization of step variables:

Synchronized the new step variables with the global batch counters at the start of each train/test batch, keeping them consistent throughout the training/testing process. [1] [2]

The _push_writer method captured the step argument by value (always 0) from on_train_begin/on_test_begin. This caused two issues: 1. should_record() always returned True (0 % N == 0), defeating the update_freq gating mechanism 2. writer.as_default(step=0) set the default step for custom user summaries to 0 permanently, so all custom summaries overwrote each other at step 0 Use tf.Variable for the step counter instead of a plain integer. The variable is updated on each batch begin, and since writer.as_default() reads the variable's current value, both the recording gate and custom summary default step now reflect the actual batch number. Fixes keras-team#20143

gemini-code-assist · 2026-03-05T12:38:53Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a significant bug in the Keras TensorBoard callback where batch-level summaries failed to update due to the step counter being captured as a static integer. By introducing TensorFlow tf.Variables to dynamically track training and testing steps, this change ensures that TensorBoard accurately reflects the progress of batch-level metrics, providing correct and useful visualizations for model development.

Highlights

TensorBoard Step Counter Fix: Resolved a critical issue where the TensorBoard callback's step counter was not updating correctly, causing batch-level summaries to display frozen or inaccurate data.
Introduction of tf.Variable for Step Tracking: Replaced static Python integer counters with dynamic tf.Variable objects (_train_step_var, _test_step_var) to accurately track training and testing steps, leveraging TensorFlow's lazy evaluation for correct summary writing.
Synchronized Step Variable Updates: Implemented logic to assign the current global batch counts to the new tf.Variables at the beginning of each training and testing batch, ensuring that TensorBoard always receives the most up-to-date step value.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

keras/src/callbacks/tensorboard.py
- Updated the _push_writer method to accept a tf.Variable for step tracking instead of a plain integer.
- Initialized _train_step_var and _test_step_var as tf.Variables in the on_train_begin and on_test_begin methods, respectively.
- Added assignment operations in on_train_batch_begin and on_test_batch_begin to update the tf.Variables with the current global batch counts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly addresses the issue of the TensorBoard step counter not updating for batch-level summaries by replacing the Python int with a tf.Variable. This ensures that the step value is passed by reference and updated correctly. The changes are logical and well-implemented. I have added a couple of minor suggestions to improve code maintainability by de-duplicating tensorflow imports.

gemini-code-assist · 2026-03-05T12:41:32Z

keras/src/callbacks/tensorboard.py

        )

    def on_train_begin(self, logs=None):
+        import tensorflow as tf


This import tensorflow as tf statement is also present in on_test_begin. To avoid duplication and improve maintainability, consider importing tensorflow once at the top of the file using the Keras-idiomatic lazy loader:

# At the top of keras/src/callbacks/tensorboard.py from keras.src.utils.module_utils import tensorflow as tf

This would allow you to remove the local imports from both on_train_begin and on_test_begin.

gemini-code-assist · 2026-03-05T12:41:32Z

keras/src/callbacks/tensorboard.py


    def on_test_begin(self, logs=None):
-        self._push_writer(self._val_writer, self._global_test_batch)
+        import tensorflow as tf


This import tensorflow as tf is a duplicate of the one in on_train_begin. As suggested in the other comment, this can be de-duplicated by moving the import to the top of the file for better code maintainability.

codecov-commenter · 2026-03-05T12:46:49Z

Codecov Report

❌ Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.95%. Comparing base (95e74a9) to head (4f115a5).

Files with missing lines	Patch %	Lines
keras/src/callbacks/tensorboard.py	83.33%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #22357      +/-   ##
==========================================
- Coverage   82.95%   82.95%   -0.01%     
==========================================
  Files         595      595              
  Lines       66040    66048       +8     
  Branches    10305    10307       +2     
==========================================
+ Hits        54785    54791       +6     
  Misses       8639     8639              
- Partials     2616     2618       +2

Flag	Coverage Δ
keras	`82.78% <83.33%> (-0.01%)`	⬇️
keras-jax	`60.84% <83.33%> (+<0.01%)`	⬆️
keras-numpy	`55.02% <8.33%> (-0.01%)`	⬇️
keras-openvino	`49.10% <8.33%> (-0.01%)`	⬇️
keras-tensorflow	`62.06% <83.33%> (+<0.01%)`	⬆️
keras-torch	`60.87% <83.33%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

google-ml-butler bot added the size:S label Mar 5, 2026

google-ml-butler bot assigned gbaned Mar 5, 2026

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TensorBoard callback step counter never updating#22357

Fix TensorBoard callback step counter never updating#22357
pctablet505 wants to merge 1 commit intokeras-team:masterfrom
pctablet505:fix/20143-tensorboard-step-counter

pctablet505 commented Mar 5, 2026

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 5, 2026

Uh oh!

gemini-code-assist bot Mar 5, 2026

Uh oh!

codecov-commenter commented Mar 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pctablet505 commented Mar 5, 2026

Problem

Root Cause

Fix

Files Changed

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Mar 5, 2026 •

edited

Loading