Skip to content

logger.set_level and verbose don't play well together #6394

Open
@jcrist

Description

@jcrist

Internally cuml makes use of a single global logger with a single global log level. This lets internal code make logging calls like

/* A C++ log */
CUML_LOG_DEBUG("some debug log message")

or

# A python log
logger.debug("some debug log message")

Users may change the global log level by calling logger.set_level(logger.level_enum.info).

Seems simple enough on initial inspection. But then enter verbose.

Estimators and certain functions also take a verbose parameter that intends to mirror sklearn's verbose parameter (where a higher integer means "log more"). In most estimators this is implemented by translating the verbose kwarg to a level_enum, then calling set_level (usually in C++), mutating the global logger level.

In [1]: import cuml

In [2]: X, _ = cuml.datasets.make_blobs()

In [3]: tsne = cuml.TSNE(verbose=6)  # very verbose

In [4]: cuml.internals.logger.get_level()  # current log level
Out[4]: <level_enum.warn: 3>

In [5]: tsne.fit(X)
[2025-03-04 21:59:37.189] [CUML] [debug] Data size = (100, 2) with dim = 2 perplexity = 30.000000
[2025-03-04 21:59:37.201] [CUML] [debug] Getting distances.
[2025-03-04 21:59:37.410] [CUML] [debug] Now normalizing distances so exp(D) doesn't explode.
[2025-03-04 21:59:37.411] [CUML] [debug] Searching for optimal perplexity via bisection search.
[2025-03-04 21:59:37.605] [CUML] [debug] [t-SNE] KL divergence: 0.09647911041975021
Out[5]: TSNE()

In [6]: cuml.internals.logger.get_level()  # global log level is mutated after fit!
Out[6]: <level_enum.trace: 0>

Other estimators fully ignore the verbose parameter, continuing to log at whatever the global setting was before.

It's also unclear when the global setting should be respected or used - if all estimators and many functions take a verbose parameter (controlling local verbosity), then wouldn't the local always override (at least by the current definition of the default value of verbose=False meaning "info")?

The situation is inconsistent at best, and the global mutation + translating between verbose and level_enum settings has led to bugs (see #6393).


In summary:

  1. cuml uses a global logger to handle all logging. This logger has a global level setting.
  2. cuml also wants to support a verbose parameter a. la. sklearn, where log levels are handled local to each estimator.
  3. The default for verbose is not "respect the global setting", it's False (which corresponds to INFO). If verbose was handled perfectly, it's unclear when the global setting would be used instead by current intentions.
  4. Some estimators (e.g. PCA) ignore the verbose setting completely
  5. Others implement it by mutating the global level without resetting it (e.g. TSNE, UMAP, ...)
  6. We have 2 log level descriptors (verbose and level_enum), and failure to translate between them has led to subtle bugs (A few log level handling cleanups #6393).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions