Skip to content

Commit 1b43102

Browse files
authored
fix: remote root handlers when they exist (#3128)
### Summary In some environments, such as Google Colab, loggers have a root handling that did not mask sensitive values. As a result, secrets such as API keys appeared in the logs. The PR removes root handlers when they exist to ensure sensitive values are handler properly. ### Testing Run the following in a Colab notebook. You should see two log outputs, one with the API key masked and one with it exposed. ``` !pip install unstructured ``` ```python import logging import json from unstructured.ingest.interfaces import ( ChunkingConfig, EmbeddingConfig, PartitionConfig, ProcessorConfig, ReadConfig, ) partition_config = PartitionConfig( partition_by_api=True, api_key="super secret", ) from unstructured.ingest.logger import ingest_log_streaming_init ingest_log_streaming_init(logging.INFO) logger = logging.getLogger("unstructured.ingest") logger.setLevel(logging.INFO) logger.info( f"Running partition node to extract content from json files. " f"Config: {partition_config.to_json()}, " ) ``` Now replace the first cell with the following and rerun the Python code. Only the masked logging output should remain. ``` !git clone https://github.com/Unstructured-IO/unstructured.git && cd unstructured && git checkout fix/rm-log-dupes && pip install -e . ```
1 parent 54c1e4e commit 1b43102

File tree

3 files changed

+14
-2
lines changed

3 files changed

+14
-2
lines changed

Diff for: CHANGELOG.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 0.14.4-dev5
1+
## 0.14.4-dev6
22

33
### Enhancements
44

@@ -12,6 +12,7 @@
1212

1313
### Fixes
1414

15+
* **Remove root handlers in ingest logger**. Removes root handlers in ingest loggers to ensure secrets aren't accidentally exposed in Colab notebooks.
1516
* **Fix V2 S3 Destination Connector authentication** Fixes bugs with S3 Destination Connector where the connection config was neither registered nor properly deserialized.
1617
* **Clarified dependence on particular version of `python-docx`** Pinned `python-docx` version to ensure a particular method `unstructured` uses is included.
1718
* **Ingest preserves original file extension** Ingest V2 introduced a change that dropped the original extension for upgraded connectors. This reverts that change.

Diff for: unstructured/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.14.4-dev5" # pragma: no cover
1+
__version__ = "0.14.4-dev6" # pragma: no cover

Diff for: unstructured/ingest/logger.py

+11
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,15 @@ def format(self, record):
9494
return redact_jsons(s)
9595

9696

97+
def remove_root_handlers(logger: logging.Logger) -> None:
98+
# NOTE(robinson) - in some environments such as Google Colab, there is a root handler
99+
# that doesn't not mask secrets, meaning sensitive info such as api keys appear in logs.
100+
# Removing these when they exist prevents this behavior
101+
if logger.root.hasHandlers():
102+
for handler in logger.root.handlers:
103+
logger.root.removeHandler(handler)
104+
105+
97106
def ingest_log_streaming_init(level: int) -> None:
98107
handler = logging.StreamHandler()
99108
handler.name = "ingest_log_handler"
@@ -104,6 +113,7 @@ def ingest_log_streaming_init(level: int) -> None:
104113
if "ingest_log_handler" not in [h.name for h in logger.handlers]:
105114
logger.addHandler(handler)
106115

116+
remove_root_handlers(logger)
107117
logger.setLevel(level)
108118

109119

@@ -116,4 +126,5 @@ def make_default_logger(level: int) -> logging.Logger:
116126
handler.setFormatter(formatter)
117127
logger.addHandler(handler)
118128
logger.setLevel(level)
129+
remove_root_handlers(logger)
119130
return logger

0 commit comments

Comments
 (0)