-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Add token refresh for UC Lineage ingestion using system tables #15698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add token refresh for UC Lineage ingestion using system tables #15698
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Could you add some unit tests?
-
For the future improvement, it would be nice if we can separate out the token provider part. For example:
class SqlConnectionTokenProvider(Protocol):
def get_token(self) -> str: ...
class StaticTokenProvider:
def __init__(self, token: str):
self._token = token
def get_token(self) -> str:
return self._token
class AzureTokenProvider:
...|
|
||
| # Refresh token if it expires within 5 minutes | ||
| remaining_time = self._token_expiration - datetime.now(timezone.utc) | ||
| return remaining_time < timedelta(minutes=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to refresh within 10 mins before expiration? Line 414 says that it would be within 5 mins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have used authenticate function, so removed this part of the code
| def _refresh_workspace_client(self): | ||
| """Refresh the workspace client with new credentials.""" | ||
| if self._azure_auth: | ||
| logger.info("Refreshing workspace client with new Azure credentials") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to be debug level logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| if self._azure_auth: | ||
| logger.info("Refreshing workspace client with new Azure credentials") | ||
| # Create a new workspace client with fresh credentials | ||
| self._workspace_client = WorkspaceClient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the brief investigation, recreating client is redundant. authenticate() will refresh token if it's expired. Please, check this.
authenticate() -> _header_factory() (config.py) -> OAuthCredentialsProvider.call() (credentials_provider.py) -> refreshed_headers() (credentials_provider.py) -> inner.token() (credentials_provider.py) -> token() (oath.py) -> _blocking_token() (oath.py) -> refresh() (oath.py)
Please, check token() and _blcoking_token() functions: https://github.com/databricks/databricks-sdk-py/blob/main/databricks/sdk/oauth.py#L280-L333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| product_version=nice_version_name(), | ||
| ) | ||
|
|
||
| def _refresh_sql_connection_token(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's safe with multi threads.
|

If using system tables to query lineage, the sql connection fails.
Reason - sql connection requires PAT to connect to databricks and run queries. As the token is valid for only 60mins, and some of the ingestions run for long duration, the sql connection expires and the process fails with authentication error.
Fix - To use azure credentials to generate tokens. Have updated the code to generate PAT using azure credentials. The validity of PAT is 60 mins here too, but we can refresh the PAT before expiring so that the process continues without any halt