Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement sigv4 signing for s3 downloads #21956

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/notes/2.26.x.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,9 @@ For the `tfsec` linter, the deprecation of support for leading `v`s in the `vers

The S3 backend now creates new AWS credentials when the `AWS_` environment variables change. This allows credentials to be updated without restarting the Pants daemon.

The S3 backend now uses Signature Version 4 for signing requests, allowing use of KMS encrypted objects in S3.


### Plugin API changes


Expand Down
10 changes: 8 additions & 2 deletions src/python/pants/backend/url_handlers/s3/integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ def __init__(self):
def set_config_variable(self, key, value):
self.config_vars.update({key: value})

def get_config_variable(self, key):
return self.config_vars.get(key) or "us-east-1"

def get_credentials(self):
if self.creds:
return self.creds
Expand Down Expand Up @@ -91,14 +94,17 @@ def load_credentials(self):
Credentials=FakeCredentials.create,
)

def fake_auth_ctor(creds):
def fake_auth_ctor(creds, service_name, region_name):
assert service_name == "s3"
assert region_name in ["us-east-1", "us-west-2"]

def add_auth(request):
request.url == expected_url
request.headers["AUTH"] = "TOKEN"

return SimpleNamespace(add_auth=add_auth)

botocore.auth = SimpleNamespace(HmacV1Auth=fake_auth_ctor)
botocore.auth = SimpleNamespace(SigV4Auth=fake_auth_ctor)

monkeypatch.setitem(sys.modules, "botocore", botocore)

Expand Down
46 changes: 29 additions & 17 deletions src/python/pants/backend/url_handlers/s3/register.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from pants.engine.env_vars import EnvironmentVars, EnvironmentVarsRequest
from pants.engine.environment import ChosenLocalEnvironmentName, EnvironmentName
from pants.engine.fs import Digest, NativeDownloadFile
from pants.engine.internals.native_engine import FileDigest
from pants.engine.internals.native_engine import EMPTY_FILE_DIGEST, FileDigest
from pants.engine.internals.selectors import Get
from pants.engine.rules import collect_rules, rule
from pants.engine.unions import UnionRule
Expand All @@ -21,13 +21,13 @@

CONTENT_TYPE = "binary/octet-stream"


logger = logging.getLogger(__name__)


@dataclass(frozen=True)
class AWSCredentials:
creds: Any
default_region: str | None


@rule
Expand Down Expand Up @@ -92,8 +92,9 @@ async def access_aws_credentials(
)

creds = credentials.create_credential_resolver(session).load_credentials()
default_region = session.get_config_variable("region")

return AWSCredentials(creds)
return AWSCredentials(creds=creds, default_region=default_region)


@dataclass(frozen=True)
Expand All @@ -111,36 +112,47 @@ async def download_from_s3(
) -> Digest:
from botocore import auth, compat, exceptions # pants: no-infer-dep

# NB: The URL for auth is expected to be in path-style
path_style_url = "https://s3"
virtual_hosted_url = f"https://{request.bucket}.s3.amazonaws.com/{request.key}"
if request.region:
path_style_url += f".{request.region}"
path_style_url += f".amazonaws.com/{request.bucket}/{request.key}"
virtual_hosted_url = (
f"https://{request.bucket}.s3.{request.region}.amazonaws.com/{request.key}"
)

if request.query:
path_style_url += f"?{request.query}"
virtual_hosted_url += f"?{request.query}"

headers = compat.HTTPHeaders()
http_request = SimpleNamespace(
url=path_style_url,
url=virtual_hosted_url,
headers=headers,
method="GET",
auth_path=None,
data=None,
params={},
context={},
body={},
)

# NB: The added Auth header doesn't need to be valid when accessing a public bucket. When
# hand-testing, you MUST test against a private bucket to ensure it works for private buckets too.
signer = auth.HmacV1Auth(aws_credentials.creds)
# adding x-amz-content-SHA256 as per boto code
# ref link - https://github.com/boto/botocore/blob/547b20801770c8ea4255ee9c3b809fea6b9f6bc4/botocore/auth.py#L52C1-L54C2
headers.add_header(
"X-Amz-Content-SHA256",
EMPTY_FILE_DIGEST.fingerprint,
)

# We require a region to sign the request with sigv4
# If we don't know where the bucket is, default to the region from the credentials
# and fallback to us-east-1
signing_region = request.region or aws_credentials.default_region or "us-east-1"

signer = auth.SigV4Auth(aws_credentials.creds, "s3", signing_region)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth supporting the old codepath under a flag (HmacV1Auth)? Not sure risky you view this change as

Copy link
Contributor

@tdyas tdyas Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. If on the remote chance a user does see an issue, they can just configure the old behavior. Even if S3 might be perfectly fine with it, I can imagine a user using some S3 API-compatible service which we have never heard of and having an issue. It may never happen but I can't discount the possibility. Feature flags are cheap insurance.

You can set a removal_version and removal_hint on the transition option so that we maintainers know to remove the option at an appropriate point in the future (or reevaluate its necessity at least, maybe document that in the removal_hint).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular options subsystem I should add to? I don't see one for url handlers/s3. Or I could make a new one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't see one either. Maybe add a new one then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe aws-s3-download-handler or a better name?

try:
signer.add_auth(http_request)
except exceptions.NoCredentialsError:
pass # The user can still access public S3 buckets without credentials

virtual_hosted_url = f"https://{request.bucket}.s3"
if request.region:
virtual_hosted_url += f".{request.region}"
virtual_hosted_url += f".amazonaws.com/{request.key}"
if request.query:
virtual_hosted_url += f"?{request.query}"

return await Get(
Digest,
NativeDownloadFile(
Expand Down
Loading