-
Notifications
You must be signed in to change notification settings - Fork 717
Description
Is your feature request related to a problem? Please describe.
Yes. In high-throughput DynamoDB workloads (e.g., AWS Lambda processing Kinesis streams), every call to wr.dynamodb.read_items()
triggers a DescribeTable
API call to fetch the key schema.
This generates a huge number of CloudTrail events, which in turn drives up CloudTrail and GuardDuty costs.
When the table schema is already known (partition key + optional sort key), these extra API calls are unnecessary and expensive.
Describe the solution you'd like
I would like read_items()
to support explicitly passing partition and sort key names.
If these parameters are provided, Wrangler should skip the DescribeTable
call and use the provided schema directly.
The default behavior (auto-discover schema via DescribeTable
) can remain for backwards compatibility.
Example API shape:
df = wr.dynamodb.read_items(
table_name="my-table",
partition_key="pk",
sort_key="sk",
partition_values=["pv1", "pv2"],
sort_values=["sv1", "sv2"]
)
Describe alternatives you've considered
- Monkey-patching
read_items
to skipDescribeTable
. This works but is fragile and version-specific. - Replacing Wrangler with raw boto3 calls and custom DataFrame conversion. This avoids the issue but loses Wrangler’s ergonomics.
- Adding caching around
DescribeTable
to reduce the frequency, but this still adds complexity and doesn’t eliminate unnecessary API calls.
Additional context
This feature would make read_items
safer and more cost-efficient in serverless, high-volume architectures, where cold starts and container churn can otherwise generate thousands of unnecessary DescribeTable
calls.
It would also give developers more control over performance and cost tradeoffs, while keeping backward compatibility for existing users.