Update Psycopg3 example to include non-admin and connection refresh logic by danielfrankcom · Pull Request #113 · aws-samples/aurora-dsql-samples

danielfrankcom · 2025-05-02T23:49:00Z

This PR updates the Psycopg3 example. It includes the following changes:

Synchronize the example with the publicly available docs here
Clean up and improve the documentation/comments
Fix dependencies
Add non-admin connection example
Clarify token creation logic

By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT-0 license.

trstephen-amazon

Add token refresh logic

Could you point this out to me?

danielfrankcom · 2025-05-03T00:56:00Z

The token refresh logic is separated into a function which provides a connection, and a comment was added to make it clear the token is per-connection. There isn't really a built-in mechanism for credential management for Psycopg. The PR description should probably say something like "clarified token creation".

trstephen-amazon · 2025-05-03T01:10:31Z

Token refresh

Thanks for the clarification! SGTM

danielfrankcom · 2025-05-03T01:23:11Z

Actions are fixed now, should be ready for final review.

I've had to add a FORCE_IPV4 environment variable to work around an issue with the action definition. It seems like the GitHub runner can resolve IPv6 addresses but can't reach out to the IPv6 internet. Psycopg prefers IPv6 so it's not able to reach the cluster.

I've modified the example to manually resolve an IPv4 address and use that for the connection when FORCE_IPV4 is set. Ideally we can update the workflow to disable IPv6 in a future change.

trstephen-amazon · 2025-05-03T04:32:59Z

+    if os.environ.get("FORCE_IPV4", False):
+        try:
+            # Get the IPv4 address for the host
+            conn_params["hostaddr"] = socket.gethostbyname(cluster_endpoint)
+        except socket.gaierror:
+            # If DNS resolution fails, continue without hostaddr
+            pass


This is a good find! It's exactly the type of behavior we want to highlight for anyone who wants to use psycopg3 with dsql.

Let's dig a bit deeper here. I see a few issues with psycopg and ipv6: maybe we've discovered another case.

I did some more testing this morning, and haven't been able to reproduce the issue again. I've reverted the PR to exclude the FORCE_IPV4 workaround, since based on my investigation below it shouldn't be needed.

I looking into the Psycopg3 source code a bit, and found it uses the following approximate algorithm:

Determine the connection attempts for the provided conninfo: ref
a. Check the conninfo. If it contains hostaddr then return that address. If the host parameter is an IP address, return that address.
b. Otherwise, do a DNS lookup and get all IPs associated with the domain. Return all addresses from the lookup.

For each address returned: ref
a. Attempt to open a connection to the address/port described by the connection attempt.
b. If the connection succeeds, save the connection and break from the loop.
c. If the connection fails, record the exception and continue with other connection attempts.

If no working connection was found, rethrow the last exception.

Return the working connection if we reach this point.

Based on this algorithm, it shouldn't matter whether the host supports IPv6, or whether the host is in a broken state with partial IPv6 support. If an IPv4 address is returned from the DNS lookup then it should be attempted in the loop, and a successful connection should result regardless of whether the IPv6 connection comes before or after it in the connection attempt list.

Unfortunately with the level of information we get from the error and stacktrace here, it is not possible to tell what the other connection attempts in the list were, or if there were any at all. There is a debug log here which would provide more information in the event of a failure, but that level of logging isn't enabled for our runs at the moment.

Given I can't reproduce the issue any more, I'm inclined to think it was caused by either a DNS issue preventing IPv4 hosts from being returned, or a genuine connectivity issue on the host running the workflow. It's also possible it's an intermittent problem with a low % chance to occur, but it seemed pretty consistent when I was testing last week. For whatever reason, now the same code that was failing before is passing in the workflows.

Perhaps the best course of action would be to enable debug logging for this driver, and set up some kind of regular run of the workflow to see if we can catch it again. As of the moment, I don't think we have enough information to create a local reproducer for this, since we don't know the underlying cause without more visibility into the other connection attempts that were made here.

It would be good to regularly run these workflows regardless of this issue, to ensure the tests continue to work over time and with newer dependency versions.

While trying to enable Psycopg3 debug logs to help find this issue in the future, I accidentally triggered it again, and think I understand what is going on here now.

There are workflow runs here and here which clearly shows the issue.

DEBUG - psycopg - connection attempt failed: host: '***' port: '5432', hostaddr '54.158.101.233': connection failed: connection to server at "54.158.101.233", port 5432 failed: root certificate file "./root.pem" does not exist Either provide the file, use the system's trusted roots with sslrootcert=system, or change sslmode to disable server certificate verification. DEBUG - psycopg - connection attempt failed: host: '***' port: '5432', hostaddr '2600:1f18:692c:303:31c4:3b43:ed6d:d04d': connection is bad: connection to server at "2600:1f18:692c:303:31c4:3b43:ed6d:d04d", port 5432 failed: Network is unreachable Is the server running on that host and accepting TCP/IP connections?

Though the example code was the same during my testing, it seems the workflow wasn't exactly the same. Last week when it was failing, the root.pem file was not being downloaded yet, as I encountered this issue before I reached any problem with the certificates and wanted to move one step at a time. My perception was that this was a network connectivity problem, and would occur before the certificate was verified as there is no way to verify a certificate if we can't communicate with the cluster.

As it turns out this assumption was incorrect, as the way the connection logic works accounts for the certificate in the connection attempt, even though it doesn't show the problem without debug logging enabled. When I tested again this morning I was still downloading the root.pem certificate, and so didn't encounter this issue even after reverting the FORCE_IPV4 changes to the example.

What is happening in this scenario is the following:

Psycopg does a DNS lookup which returns an IPv4 and IPv6 address.

The loop iterates through the addresses.

The IPv4 connection is established, but the certificate verification fails. In this run it fails because the certificate isn't downloaded and the file doesn't exist. In this run it fails because I changed the sslrootcert parameter to be system, and the certificate wasn't in the system trust store.

The loop marks the IPv4 address as failed and stores its exception. The loop continues to the IPv6 address.

The IPv6 connection fails as the host does not have functioning dual stack networking.

The IPv6 exception overwrites the stored exception.

All connection attempts have failed, and Psycopg shows the most recent exception to the user. This exception is the IPv6 one, which suppresses the actual issue which is with the certificates.

This problem doesn't occur with what's on the current main branch, since it is using sslmode=require instead of sslmode=verify-full.

This behavior would likely occur for any user of this example which (a) does not have functioning IPv6 on their host, and (b) does not properly set up the root certificate before running the example. Though the README.md file describes the steps for setting up the root certificate, it seems very likely people will miss it or skip it, and encounter this confusing error message which implies they have an IPv6 problem.

To prevent this issue I think we should do the following:

Add a try/except around the connect method call, and check if the certificate file exists. If it doesn't print an additional message to help diagnose the problem.

Raise a ticket with the Psycopg3 team around this issue. It seems the intention of the looping connection logic was to prevent duplicate failure messages, but in this case the logic is suppressing the actual problem. The resulting message is pretty confusing, and actually has nothing to do with the cause of the connection failure. Within our team alone we've run into this twice seemingly, and it is likely other users of Psycopg3 have experienced the same. I have not checked yet to see if there is an existing issue around this, but I will check first.

danielfrankcom · 2025-05-05T20:27:26Z

Latest push fixes the license headers on source files which I missed before.

danielfrankcom · 2025-05-05T23:13:27Z

Latest push adds an explicit check for the SSL certificate, which should work around the IPv6 error for now.

If the user forgets to download the SSL certificate as mentioned in the README.md file, they will receive an error message which indicates the actual problem, rather than the confusing IPv6 error message they would otherwise see.

Traceback (most recent call last):
  File "/home/runner/work/aurora-dsql-samples/aurora-dsql-samples/python/psycopg/src/example.py", line 127, in <module>
    main()
  File "/home/runner/work/aurora-dsql-samples/aurora-dsql-samples/python/psycopg/src/example.py", line 117, in main
    conn = create_connection(cluster_user, cluster_endpoint, region)
  File "/home/runner/work/aurora-dsql-samples/aurora-dsql-samples/python/psycopg/src/example.py", line 43, in create_connection
    raise FileNotFoundError(f"SSL certificate file not found: {ssl_cert_path}")
FileNotFoundError: SSL certificate file not found: ./root.pem

vs.

DEBUG - psycopg - connection attempt failed: host: '***' port: '5432', hostaddr '2600:1f18:692c:303:31c4:3b43:ed6d:d04d': connection is bad: connection to server at "2600:1f18:692c:303:31c4:3b43:ed6d:d04d", port 5432 failed: Network is unreachable
	Is the server running on that host and accepting TCP/IP connections?

trstephen-amazon

🚢

…ogic

danielfrankcom force-pushed the chore/update-psycopg3-samples branch 5 times, most recently from a618818 to 7d8b67d Compare May 3, 2025 00:38

trstephen-amazon reviewed May 3, 2025

View reviewed changes

Comment thread .github/workflows/python-psycopg3-integ-tests.yml

Comment thread python/psycopg/README.md Outdated

danielfrankcom force-pushed the chore/update-psycopg3-samples branch from 7d8b67d to 9bf4f5a Compare May 3, 2025 00:49

trstephen-amazon mentioned this pull request May 3, 2025

Update pgJDBC example to include non-admin and connection refresh logic #112

Merged

danielfrankcom force-pushed the chore/update-psycopg3-samples branch 2 times, most recently from 3a0cf88 to 949ec91 Compare May 3, 2025 01:20

danielfrankcom force-pushed the chore/update-psycopg3-samples branch from 949ec91 to bb0347a Compare May 3, 2025 01:24

trstephen-amazon reviewed May 3, 2025

View reviewed changes

danielfrankcom force-pushed the chore/update-psycopg3-samples branch 2 times, most recently from 1b5d019 to 73266d0 Compare May 5, 2025 20:27

danielfrankcom mentioned this pull request May 5, 2025

Connection errors suppressed by subsequent connection attempts psycopg/psycopg#1069

Closed

trstephen-amazon approved these changes May 6, 2025

View reviewed changes

Comment thread python/psycopg/README.md Outdated

Comment thread python/psycopg/src/example.py

danielfrankcom added 3 commits May 5, 2025 17:42

Update Psycopg3 example to include non-admin and connection refresh l…

3378daf

…ogic

Update test workflow to match new example

fecb33c

Add explicit check for SSL certificate

00807b9

danielfrankcom force-pushed the chore/update-psycopg3-samples branch from 4479917 to 00807b9 Compare May 6, 2025 00:42

danielfrankcom merged commit db87703 into main May 6, 2025
3 checks passed

danielfrankcom pushed a commit to marcbowes/aurora-dsql-samples that referenced this pull request May 9, 2025

Update crypto dependency (aws-samples#113)

f7467d2

danielfrankcom deleted the chore/update-psycopg3-samples branch May 20, 2025 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Psycopg3 example to include non-admin and connection refresh logic#113

Update Psycopg3 example to include non-admin and connection refresh logic#113
danielfrankcom merged 3 commits intomainfrom
chore/update-psycopg3-samples

danielfrankcom commented May 2, 2025 •

edited

Loading

Uh oh!

trstephen-amazon left a comment

Uh oh!

Uh oh!

Uh oh!

danielfrankcom commented May 3, 2025

Uh oh!

trstephen-amazon commented May 3, 2025

Uh oh!

danielfrankcom commented May 3, 2025

Uh oh!

trstephen-amazon May 3, 2025

Uh oh!

danielfrankcom May 5, 2025

Uh oh!

danielfrankcom May 5, 2025

Uh oh!

danielfrankcom commented May 5, 2025

Uh oh!

danielfrankcom commented May 5, 2025

Uh oh!

trstephen-amazon left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielfrankcom commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trstephen-amazon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

danielfrankcom commented May 3, 2025

Uh oh!

trstephen-amazon commented May 3, 2025

Uh oh!

danielfrankcom commented May 3, 2025

Uh oh!

trstephen-amazon May 3, 2025

Choose a reason for hiding this comment

Uh oh!

danielfrankcom May 5, 2025

Choose a reason for hiding this comment

Uh oh!

danielfrankcom May 5, 2025

Choose a reason for hiding this comment

Uh oh!

danielfrankcom commented May 5, 2025

Uh oh!

danielfrankcom commented May 5, 2025

Uh oh!

trstephen-amazon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielfrankcom commented May 2, 2025 •

edited

Loading