Description
I've been trying to setup processing container with unstructured_ingest.v2
and have been fighting with the logger for a second. I don't work in python too often, so reminding myself how loggers work took some ramp up.
Since I'm trying to deploy this processing container in prod w proper observability I've been trying to get it log with structured outputs. For the longest time, unstructured logs were not showing up properly (normal formating, no json)
I was extremely confused for a while until I dug into the packages config where I found:
with the comment:
def remove_root_handlers(logger: Logger) -> None:
# NOTE(robinson): in some environments such as Google Colab, there is a root handler
# that doesn't not mask secrets, meaning sensitive info such as api keys appear in logs.
# Removing these when they exist prevents this behavior
if logger.root.hasHandlers():
for handler in logger.root.handlers:
logger.root.removeHandler(handler)
I understand this is for secret hiding, but I believe this is the wrong way to go about this. It turns out commenting out that line allows me to still log while keeping the secret obfuscation logic.
{"timestamp": "2025-01-29 14:24:51,415", "level": "INFO", "logger": "unstructured_ingest.v2", "message": "Created download with configs: {\"download_dir\":null}, connection configs: {\"access_config\":\"**********\"}"}
I haven't tested in google collab, and I haven't done much more investigation into why google collab would deobfuscate these variables, but this was very frustrating as I was getting into using the library and I'd love to see what could be done to make this less of a hassle for future people using the library.