Skip to content

Fix logging configuration to prevent OpenTelemetry handler conflicts #8269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dmavrommatis
Copy link

Fix logging configuration to prevent OpenTelemetry handler conflicts

This PR modifies how DSPy configures its loggers to prevent conflicts with OpenTelemetry logging handlers and avoid long blocking operations during logger configuration.

Problem

The current implementation uses logging.config.dictConfig() which triggers a full handler cleanup through _clearExistingHandlers(). This causes issues when OpenTelemetry logging handlers are present in the environment because:

  1. The cleanup process attempts to flush and shutdown all existing handlers
  2. OpenTelemetry handlers' flush operation can block for a long time (30+ seconds)
  3. This affects any application using both DSPy and OpenTelemetry for logging

Here's an example that demonstrates the issue run with opentelemetry-instrument:

import logging
import logging.config

logger = logging.getLogger(__name__)

def main():
    import time
    logger.info("Starting main")
    starttime = time.time()
    logging.config.dictConfig({"version":1})
    logger.info(f"logging config change took {time.time()-starttime}")

if __name__ == "__main__":
    main()

This simple code takes 30+ seconds to run and often needs to be interrupted. Here's the stack trace showing where it blocks:

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in run_code
exec(code, run_globals)
File "/root/test_run.py", line 16, in <module>
main()
File "/root/test_run.py", line 11, in main
logging.config.dictConfig({"version":1})
File "/usr/lib/python3.10/logging/config.py", line 811, in dictConfig
dictConfigClass(config).configure()
File "/usr/lib/python3.10/logging/config.py", line 538, in configure
clearExistingHandlers()
File "/usr/lib/python3.10/logging/config.py", line 275, in clearExistingHandlers
logging.shutdown(logging.handlerList[:])
File "/usr/lib/python3.10/logging/init.py", line 2182, in shutdown
h.flush()
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/init.py", line 571, in flush
self.logger_provider.force_flush()
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/init.py", line 715, in force_flush
return self.multi_log_record_processor.force_flush(timeout_millis)
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/init.py", line 350, in force_flush
if not lp.force_flush((deadline_ns - current_ts) // 1000000):
File "/usr/local/lib/python3.10/dist-packages/opentelemetry/sdk/logs/_internal/export/init.py", line 378, in force_flush
ret = flush_request.event.wait(timeout_millis / 1e3)
File "/usr/lib/python3.10/threading.py", line 607, in wait
signaled = self.cond.wait(timeout)
File "/usr/lib/python3.10/threading.py", line 324, in wait
gotit = waiter.acquire(True, timeout)
KeyboardInterrupt

Solution

This PR replaces dictConfig with direct logger configuration using Python's logging API. The new implementation:

  1. Creates a named handler specific to DSPy
  2. Only removes previous DSPy handlers (if any) while preserving other handlers (including OpenTelemetry)
  3. Configures the logger with the same settings but without triggering global handler cleanup

Key changes:

  • Use logging.StreamHandler and logging.Formatter directly
  • Give the DSPy handler a specific name for identification
  • Selectively remove only DSPy handlers when reconfiguring
  • Preserve all other logging handlers in the environment

This approach allows DSPy to:

  • Coexist with OpenTelemetry and other logging systems
  • Avoid long blocking operations during configuration
  • Maintain the same logging functionality and format
  • Be a better citizen in the Python logging ecosystem

Testing

The modified code runs instantly without delays, even in environments with OpenTelemetry handlers present. The logging output remains unchanged, maintaining backward compatibility while fixing the performance issue.

Signed-off-by: dmavrommatis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant