Skip to content

add Tracker protocol and training tracking guide.#1741

Open
janfb wants to merge 4 commits intomainfrom
refactor-tracker-interface
Open

add Tracker protocol and training tracking guide.#1741
janfb wants to merge 4 commits intomainfrom
refactor-tracker-interface

Conversation

@janfb
Copy link
Contributor

@janfb janfb commented Jan 23, 2026

Context

We want a simple, extensible way to log training metrics from sbi that works with
TensorBoard by default while letting users plug in their own experiment trackers.
The previous summary_writer argument was TensorBoard-specific and hard to extend.

  • Offer a minimal protocol for logging metrics that supports external tools.
  • Keep default behavior unchanged (TensorBoard logs with a standard log directory).
  • Make it easy to add third-party trackers without changing trainer internals, it all happens on the user side.

What changed

  • Introduced a Tracker protocol (sbi/sbi_types.py) with metric/figure/param logging and a log_dir property.
  • Added TensorBoardTracker in sbi/utils/tracking.py as the default implementation.
  • Trainers now accept tracker= and log through the protocol; summary_writer= is a deprecated alias that wraps TensorBoard and warns if used.
  • plot_summary now reads log_dir from the tracker and errors with a clear message if a non-TensorBoard tracker is used.
  • add experiment tracking how-to-guide; the tracking guide includes a runnable TensorBoard example, adapter patterns for W&B/MLflow/Trackio.

Naming choices

  • Tracker: short and tool-agnostic; aligns better with MLflow/W&B conventions
    than “writer”.
  • add_figure: mirrors TensorBoard’s SummaryWriter.add_figure, keeping the
    default implementation a thin adapter.

Notes

  • External trackers are intentionally not bundled; users can implement small adapters
    that satisfy the Tracker protocol.
  • summary_writer remains for backward compatibility but is deprecated.

janfb added 2 commits January 23, 2026 09:40
- Tracker protocol with TensorBoard implementation and deprecated summary_writer alias for backward compatibility. - - - Update trainers and plot_summary to log via the tracker, and document the new tracking workflow with runnable TensorBoard examples plus adapter patterns for other tools.
@codecov
Copy link

codecov bot commented Jan 23, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
5816 1 5815 146
View the full list of 1 ❄️ flaky test(s)
tests/torchutils_test.py::TorchUtilsTest::test_searchsorted

Flake rate in main: 42.53% (Passed 50 times, Failed 37 times)

Stack Traces | 0.005s run time
.venv/lib/python3.10....../site-packages/xdist/remote.py:289: in pytest_runtest_logreport
    self.sendevent("testreport", data=data)
.venv/lib/python3.10....../site-packages/xdist/remote.py:126: in sendevent
    self.channel.send((name, kwargs))
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:912: in send
    self.gateway._send(Message.CHANNEL_DATA, self.id, dumps_internal(item))
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1629: in dumps_internal
    return _Serializer().save(obj)  # type: ignore[return-value]
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1647: in save
    self._save(obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1744: in save_tuple
    self._save(item)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1667: in _save
    dispatch(self, obj)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1740: in save_dict
    self._write_setitem(key, value)
.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1734: in _write_setitem
    self._save(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <execnet.gateway_base._Serializer object at 0x7ff4b46cd420>
obj = tensor([0.0000, 0.1111, 0.2222, 0.3333, 0.4444, 0.5556, 0.6667, 0.7778, 0.8889])

    def _save(self, obj: object) -> None:
        tp = type(obj)
        try:
            dispatch = self._dispatch[tp]
        except KeyError:
            methodname = "save_" + tp.__name__
            meth: Callable[[_Serializer, object], None] | None = getattr(
                self.__class__, methodname, None
            )
            if meth is None:
>               raise DumpError(f"can't serialize {tp}") from None
E               execnet.gateway_base.DumpError: can't serialize <class 'torch.Tensor'>

.venv/lib/python3.10....................................................../site-packages/execnet/gateway_base.py:1665: DumpError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Contributor

@manuelgloeckler manuelgloeckler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats looks great!

Also nice that one can now e.g. easily switch to e.g. wandb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants