Skip to content

feat(athena): apply lakeformation tags on database #2719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

hagelborn
Copy link
Contributor

Description

AWS lakeformation can be used to manage permissions to data. This introduced an optional lakeformation config for the athena destination which can be used to apply a set of lakeformation tags to the glue database that the pipeline is loading to. All tables in the database will inherit the same tags.

The implementation is heavily inspired by the dbt-athena adapter, and could be extended to allow resource specific lf tags.

The test requires running with iam principals that are lake formation admin, and that lakeformation has been activated in the aws account.

Related Issues

Additional Context

Copy link

netlify bot commented Jun 5, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit d0d0ce0
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/6845b4c6a987af000833052c

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hagelborn LGTM! and thanks for the tests. I think we could avoid distributing stubs with dlt package. pls. see review.

also we did some changes to lockfile recently. so it conflicts with your PR. you'll need to fix it.

thanks again

from threading import Lock

import boto3
from mypy_boto3_lakeformation import LakeFormationClient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are needed to be present only when type checking. so if you could import them conditionally:

if TYPE_CHECKING:
    ...
else:
     LakeFormationClient = Any

then we do not need to dstribute stubs via PyPI. they are not needed at runtime (I hope)

@@ -431,6 +521,11 @@ def _get_table_update_sql(
LOCATION '{location}';""")
return sql

def complete_load(self, load_id: str) -> None:
super().complete_load(load_id)
if self.config.lakeformation_config is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: in that case if manage_lf_tags fails, it won't be retried because the load package is completed above. Is adding those tags "best effort" operation or should be considered as part of "transaction" which is loading of the whole package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a best effort approach is most reasonable here, as retrying probably wouldn't be able to achieve anything. It would only fail if lakeformation permissions don't exist.

That said maybe catching any exception during manage_lf_tags would make sense so the pipeline ends successfully?

pyproject.toml Outdated
@@ -83,6 +83,7 @@ pipdeptree = {version = ">=2.9.0,<2.10", optional = true}
# pip is used by pipdeptree but not listed in its dependencies
pip = {version = ">=23.0.0", optional = true}
pyathena = {version = ">=2.9.6", optional = true}
boto3-stubs = {extras = ["lakeformation"], version = "^1.38.28", optional = true}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use stubs only for the type checking you should move them to dev dependency group and remove from the extras


# These tests require Lakeformation permissions to be able to run.
# The tests have only been run as lakeformation admin so far, since the fixtures require
# permissions to create new tags, and grant lakeformation permissions to the current role
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we'll need to change the permissions of our test user or use a separate one here. we'll figure that out

@rudolfix rudolfix self-assigned this Jun 6, 2025
@hagelborn hagelborn force-pushed the feat/lakeformation-config branch from 90b0088 to 438fb44 Compare June 8, 2025 13:54
@hagelborn hagelborn force-pushed the feat/lakeformation-config branch from 438fb44 to d0d0ce0 Compare June 8, 2025 16:05
@hagelborn
Copy link
Contributor Author

@rudolfix Thanks for your comments! I've updated and put the boto stubs in the dev group instead and made the type checking optional.

I rebased my branch on devel and relocked poetry.lock. When running the tests with make test-common suddenly some unrelated tests failed. The same tests fail for me on the devel branch. Is something broken upstream? I can see the tests are failing in CI too. Let me know if there's anything else I can do!

@hagelborn hagelborn requested a review from rudolfix June 8, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for lakeformation tags
2 participants