-
Notifications
You must be signed in to change notification settings - Fork 287
feat(athena): apply lakeformation tags on database #2719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hagelborn LGTM! and thanks for the tests. I think we could avoid distributing stubs with dlt package. pls. see review.
also we did some changes to lockfile recently. so it conflicts with your PR. you'll need to fix it.
thanks again
from threading import Lock | ||
|
||
import boto3 | ||
from mypy_boto3_lakeformation import LakeFormationClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are needed to be present only when type checking. so if you could import them conditionally:
if TYPE_CHECKING:
...
else:
LakeFormationClient = Any
then we do not need to dstribute stubs via PyPI. they are not needed at runtime (I hope)
@@ -431,6 +521,11 @@ def _get_table_update_sql( | |||
LOCATION '{location}';""") | |||
return sql | |||
|
|||
def complete_load(self, load_id: str) -> None: | |||
super().complete_load(load_id) | |||
if self.config.lakeformation_config is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: in that case if manage_lf_tags
fails, it won't be retried because the load package is completed above. Is adding those tags "best effort" operation or should be considered as part of "transaction" which is loading of the whole package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a best effort approach is most reasonable here, as retrying probably wouldn't be able to achieve anything. It would only fail if lakeformation permissions don't exist.
That said maybe catching any exception during manage_lf_tags
would make sense so the pipeline ends successfully?
pyproject.toml
Outdated
@@ -83,6 +83,7 @@ pipdeptree = {version = ">=2.9.0,<2.10", optional = true} | |||
# pip is used by pipdeptree but not listed in its dependencies | |||
pip = {version = ">=23.0.0", optional = true} | |||
pyathena = {version = ">=2.9.6", optional = true} | |||
boto3-stubs = {extras = ["lakeformation"], version = "^1.38.28", optional = true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we use stubs only for the type checking you should move them to dev dependency group and remove from the extras
|
||
# These tests require Lakeformation permissions to be able to run. | ||
# The tests have only been run as lakeformation admin so far, since the fixtures require | ||
# permissions to create new tags, and grant lakeformation permissions to the current role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we'll need to change the permissions of our test user or use a separate one here. we'll figure that out
90b0088
to
438fb44
Compare
438fb44
to
d0d0ce0
Compare
@rudolfix Thanks for your comments! I've updated and put the boto stubs in the dev group instead and made the type checking optional. I rebased my branch on |
Description
AWS lakeformation can be used to manage permissions to data. This introduced an optional lakeformation config for the athena destination which can be used to apply a set of lakeformation tags to the glue database that the pipeline is loading to. All tables in the database will inherit the same tags.
The implementation is heavily inspired by the dbt-athena adapter, and could be extended to allow resource specific lf tags.
The test requires running with iam principals that are lake formation admin, and that lakeformation has been activated in the aws account.
Related Issues
Additional Context