Stop limiting num-connections based on num-known-IPs (Improve S3-Express performance) (#407)

graebm · web-flow · commit 59569e317a4e · 2024-02-28T16:29:57.000-08:00
**Issue:** Disappointing S3-Express performance. **Description of changes:** Stop limiting num-connections based on num-known-IPs. **Diagnosing the issue:** We found that num-connections was never getting very high, because [num-connections scales based on the num-known-IPs](https://github.com/awslabs/aws-c-s3/blob/593c2ab24608d3e78708d51657be22f6ab99cb50/source/s3_client.c#L179). S3-Express endpoints have very few IPs, so their num-connections weren't scaling very high. The algorithm was adding 10 connections per known-IP. On a 100Gb/s machine, this maxed out at 250 connections once 25 IPs were known. But S3-Express endpoints only have 4 unique IPs, so they never got higher than 40 connections. This algorithm was written back when S3 returned 1 IP per DNS query. The intention was to throttle connections until more IPs were known, in order to spread load among S3's server fleet. However, as of Aug 2023 [S3 provides multiple IPs per DNS query](https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-s3-multivalue-answer-response-dns-queries/). So now, we can scale up to max connections after the first DNS query and still be spreading load. We also believed that spreading load was a key to good performance. But I found that spreading the load didn't have much impact on performance (at least now, in 2024, on the 100Gb/s machine I was using). Tests where I hard-coded a single IP and hit it with max-connections didn't differ much from tests where the load was spread among 8 IPs or 100 IPs. I want to get this change out quickly and help S3-Express, so I picked magic numbers where the num-connections math ends up with the same result as the old algorithm. Normal S3 performance is mildly improved (max-connections is reached immediately, instead of scaling up over 30sec as it finds more IPs). S3 Express performance is MUCH improved. **Future Work:** Improve this algorithm further: - expect higher throughput on connections to S3 Express - expect lower throughput on connections transferring small objects - dynamic scaling without a bunch of magic numbers ??? (sounds cool, but I don't have any ideas how this would work yet)
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ The AWS-C-S3 library is an asynchronous AWS S3 client focused on maximizing thro
 ### Key features:
 - **Automatic Request Splitting**: Improves throughput by automatically splitting the request into part-sized chunks and performing parallel uploads/downloads of these chunks over multiple connections. There's a cap on the throughput of single S3 connection, the only way to go faster is multiple parallel connections.
 - **Automatic Retries**: Increases resilience by retrying individual failed chunks of a file transfer, eliminating the need to restart transfers from scratch after an intermittent error.
-- **DNS Load Balancing**: DNS resolver continuously harvests Amazon S3 IP addresses. When load is spread across the S3 fleet, overall throughput is better than if all connections were hammering the same IP simultaneously.
+- **DNS Load Balancing**: DNS resolver continuously harvests Amazon S3 IP addresses. When load is spread across the S3 fleet, overall throughput more reliable than if all connections are going to a single IP.
 - **Advanced Network Management**: The client incorporates automatic request parallelization, effective timeouts and retries, and efficient connection reuse. This approach helps to maximize throughput and network utilization, and to avoid network overloads.
 - **Thread Pools and Async I/O**: Avoids bottlenecks associated with single-thread processing.
 - **Parallel Reads**: When uploading a large file from disk, reads from multiple parts of the file in parallel. This is faster than reading the file sequentially from beginning to end.
diff --git a/include/aws/s3/private/s3_client_impl.h b/include/aws/s3/private/s3_client_impl.h
@@ -242,8 +242,8 @@ struct aws_s3_client {
     /* Throughput target in Gbps that we are trying to reach. */
     const double throughput_target_gbps;
 
-    /* The calculated ideal number of VIP's based on throughput target and throughput per vip. */
-    const uint32_t ideal_vip_count;
+    /* The calculated ideal number of HTTP connections, based on throughput target and throughput per connection. */
+    const uint32_t ideal_connection_count;
 
     /**
      * For multi-part upload, content-md5 will be calculated if the AWS_MR_CONTENT_MD5_ENABLED is specified
@@ -484,10 +484,7 @@ struct aws_s3_endpoint *aws_s3_endpoint_acquire(struct aws_s3_endpoint *endpoint
 void aws_s3_endpoint_release(struct aws_s3_endpoint *endpoint);
 
 AWS_S3_API
-extern const uint32_t g_max_num_connections_per_vip;
-
-AWS_S3_API
-extern const uint32_t g_num_conns_per_vip_meta_request_look_up[];
+extern const uint32_t g_min_num_connections;
 
 AWS_S3_API
 extern const size_t g_expect_timeout_offset_ms;
diff --git a/source/s3_client.c b/source/s3_client.c
@@ -51,21 +51,22 @@ struct aws_s3_meta_request_work {
 
 static const enum aws_log_level s_log_level_client_stats = AWS_LL_INFO;
 
+/* max-requests-in-flight = ideal-num-connections * s_max_requests_multiplier */
 static const uint32_t s_max_requests_multiplier = 4;
 
-/* TODO Provide analysis on origins of this value. */
-static const double s_throughput_per_vip_gbps = 4.0;
-
-/* Preferred amount of active connections per meta request type. */
-const uint32_t g_num_conns_per_vip_meta_request_look_up[AWS_S3_META_REQUEST_TYPE_MAX] = {
-    10, /* AWS_S3_META_REQUEST_TYPE_DEFAULT */
-    10, /* AWS_S3_META_REQUEST_TYPE_GET_OBJECT */
-    10, /* AWS_S3_META_REQUEST_TYPE_PUT_OBJECT */
-    10  /* AWS_S3_META_REQUEST_TYPE_COPY_OBJECT */
-};
+/* This is used to determine the ideal number of HTTP connections. Algorithm is roughly:
+ * num-connections-max = throughput-target-gbps / s_throughput_per_connection_gbps
+ *
+ * Magic value based on: match results of the previous algorithm,
+ * where throughput-target-gpbs of 100 resulted in 250 connections.
+ *
+ * TODO: Improve this algorithm (expect higher throughput for S3 Express,
+ * expect lower throughput for small objects, etc)
+ */
+static const double s_throughput_per_connection_gbps = 100.0 / 250;
 
-/* Should be max of s_num_conns_per_vip_meta_request_look_up */
-const uint32_t g_max_num_connections_per_vip = 10;
+/* After throughput math, clamp the min/max number of connections */
+const uint32_t g_min_num_connections = 10; /* Magic value based on: 10 was old behavior */
 
 /**
  * Default part size is 8 MiB to reach the best performance from the experiments we had.
@@ -151,32 +152,9 @@ uint32_t aws_s3_client_get_max_active_connections(
     struct aws_s3_client *client,
     struct aws_s3_meta_request *meta_request) {
     AWS_PRECONDITION(client);
+    (void)meta_request;
 
-    uint32_t num_connections_per_vip = g_max_num_connections_per_vip;
-    uint32_t num_vips = client->ideal_vip_count;
-
-    if (meta_request != NULL) {
-        num_connections_per_vip = g_num_conns_per_vip_meta_request_look_up[meta_request->type];
-
-        struct aws_s3_endpoint *endpoint = meta_request->endpoint;
-        AWS_ASSERT(endpoint != NULL);
-
-        AWS_ASSERT(client->vtable->get_host_address_count);
-        size_t num_known_vips = client->vtable->get_host_address_count(
-            client->client_bootstrap->host_resolver, endpoint->host_name, AWS_GET_HOST_ADDRESS_COUNT_RECORD_TYPE_A);
-
-        /* If the number of known vips is less than our ideal VIP count, clamp it. */
-        if (num_known_vips < (size_t)num_vips) {
-            num_vips = (uint32_t)num_known_vips;
-        }
-    }
-
-    /* We always want to allow for at least one VIP worth of connections. */
-    if (num_vips == 0) {
-        num_vips = 1;
-    }
-
-    uint32_t max_active_connections = num_vips * num_connections_per_vip;
+    uint32_t max_active_connections = client->ideal_connection_count;
 
     if (client->max_active_connections_override > 0 &&
         client->max_active_connections_override < max_active_connections) {
@@ -530,7 +508,7 @@ struct aws_s3_client *aws_s3_client_new(
     }
     /* Setup cannot fail after this point. */
 
-    if (client_config->throughput_target_gbps != 0.0) {
+    if (client_config->throughput_target_gbps > 0.0) {
         *((double *)&client->throughput_target_gbps) = client_config->throughput_target_gbps;
     } else {
         *((double *)&client->throughput_target_gbps) = s_default_throughput_target_gbps;
@@ -539,10 +517,14 @@ struct aws_s3_client *aws_s3_client_new(
     *((enum aws_s3_meta_request_compute_content_md5 *)&client->compute_content_md5) =
         client_config->compute_content_md5;
 
-    /* Determine how many vips are ideal by dividing target-throughput by throughput-per-vip. */
+    /* Determine how many connections are ideal by dividing target-throughput by throughput-per-connection. */
     {
-        double ideal_vip_count_double = client->throughput_target_gbps / s_throughput_per_vip_gbps;
-        *((uint32_t *)&client->ideal_vip_count) = (uint32_t)ceil(ideal_vip_count_double);
+        double ideal_connection_count_double = client->throughput_target_gbps / s_throughput_per_connection_gbps;
+        /* round up and clamp */
+        ideal_connection_count_double = ceil(ideal_connection_count_double);
+        ideal_connection_count_double = aws_max_double(g_min_num_connections, ideal_connection_count_double);
+        ideal_connection_count_double = aws_min_double(UINT32_MAX, ideal_connection_count_double);
+        *(uint32_t *)&client->ideal_connection_count = (uint32_t)ideal_connection_count_double;
     }
 
     client->cached_signing_config = aws_cached_signing_config_new(client, client_config->signing_config);
@@ -1687,7 +1669,7 @@ static bool s_s3_client_should_update_meta_request(
     size_t num_known_vips = client->vtable->get_host_address_count(
         client->client_bootstrap->host_resolver, endpoint->host_name, AWS_GET_HOST_ADDRESS_COUNT_RECORD_TYPE_A);
     if (num_known_vips == 0 && (client->threaded_data.num_requests_being_prepared +
-                                client->threaded_data.request_queue_size) >= g_max_num_connections_per_vip) {
+                                client->threaded_data.request_queue_size) >= g_min_num_connections) {
         return false;
     }
 
diff --git a/tests/s3_data_plane_tests.c b/tests/s3_data_plane_tests.c
@@ -259,70 +259,46 @@ static int s_test_s3_client_get_max_active_connections(struct aws_allocator *all
 
     struct aws_s3_client *mock_client = aws_s3_tester_mock_client_new(&tester);
     *((uint32_t *)&mock_client->max_active_connections_override) = 0;
-    *((uint32_t *)&mock_client->ideal_vip_count) = 10;
+    *((uint32_t *)&mock_client->ideal_connection_count) = 100;
     mock_client->client_bootstrap = &mock_client_bootstrap;
     mock_client->vtable->get_host_address_count = s_test_get_max_active_connections_host_address_count;
 
     struct aws_s3_meta_request *mock_meta_requests[AWS_S3_META_REQUEST_TYPE_MAX];
 
     for (size_t i = 0; i < AWS_S3_META_REQUEST_TYPE_MAX; ++i) {
-        /* Verify that g_max_num_connections_per_vip and g_num_conns_per_vip_meta_request_look_up are set up
-         * correctly.*/
-        ASSERT_TRUE(g_max_num_connections_per_vip >= g_num_conns_per_vip_meta_request_look_up[i]);
-
         /* Setup test data. */
         mock_meta_requests[i] = aws_s3_tester_mock_meta_request_new(&tester);
         mock_meta_requests[i]->type = i;
         mock_meta_requests[i]->endpoint = aws_s3_tester_mock_endpoint_new(&tester);
     }
 
-    /* With host count at 0, we should allow for one VIP worth of max-active-connections. */
-    {
-        s_test_max_active_connections_host_count = 0;
-
-        ASSERT_TRUE(
-            aws_s3_client_get_max_active_connections(mock_client, NULL) ==
-            mock_client->ideal_vip_count * g_max_num_connections_per_vip);
-
-        for (size_t i = 0; i < AWS_S3_META_REQUEST_TYPE_MAX; ++i) {
-            ASSERT_TRUE(
-                aws_s3_client_get_max_active_connections(mock_client, mock_meta_requests[i]) ==
-                g_num_conns_per_vip_meta_request_look_up[i]);
-        }
-    }
-
     s_test_max_active_connections_host_count = 2;
 
     /* Behavior should not be affected by max_active_connections_override since it is 0, and should just be in relation
-     * to ideal-vip-count and host-count. */
+     * to ideal-connection-count. */
     {
-        ASSERT_TRUE(
-            aws_s3_client_get_max_active_connections(mock_client, NULL) ==
-            mock_client->ideal_vip_count * g_max_num_connections_per_vip);
+        ASSERT_TRUE(aws_s3_client_get_max_active_connections(mock_client, NULL) == mock_client->ideal_connection_count);
 
         for (size_t i = 0; i < AWS_S3_META_REQUEST_TYPE_MAX; ++i) {
             ASSERT_TRUE(
                 aws_s3_client_get_max_active_connections(mock_client, mock_meta_requests[i]) ==
-                s_test_max_active_connections_host_count * g_num_conns_per_vip_meta_request_look_up[i]);
+                mock_client->ideal_connection_count);
         }
     }
 
     /* Max active connections override should now cap the calculated amount of active connections. */
     {
         *((uint32_t *)&mock_client->max_active_connections_override) = 3;
 
-        ASSERT_TRUE(
-            mock_client->max_active_connections_override <
-            mock_client->ideal_vip_count * g_max_num_connections_per_vip);
+        /* Assert that override is low enough to have effect */
+        ASSERT_TRUE(mock_client->max_active_connections_override < mock_client->ideal_connection_count);
 
         ASSERT_TRUE(
             aws_s3_client_get_max_active_connections(mock_client, NULL) ==
             mock_client->max_active_connections_override);
 
         for (size_t i = 0; i < AWS_S3_META_REQUEST_TYPE_MAX; ++i) {
-            ASSERT_TRUE(
-                mock_client->max_active_connections_override <
-                s_test_max_active_connections_host_count * g_num_conns_per_vip_meta_request_look_up[i]);
+            ASSERT_TRUE(mock_client->max_active_connections_override < mock_client->ideal_connection_count);
 
             ASSERT_TRUE(
                 aws_s3_client_get_max_active_connections(mock_client, mock_meta_requests[i]) ==
@@ -334,22 +310,17 @@ static int s_test_s3_client_get_max_active_connections(struct aws_allocator *all
     {
         *((uint32_t *)&mock_client->max_active_connections_override) = 100000;
 
-        ASSERT_TRUE(
-            mock_client->max_active_connections_override >
-            mock_client->ideal_vip_count * g_max_num_connections_per_vip);
+        /* Assert that override is NOT low enough to have effect */
+        ASSERT_TRUE(mock_client->max_active_connections_override > mock_client->ideal_connection_count);
 
-        ASSERT_TRUE(
-            aws_s3_client_get_max_active_connections(mock_client, NULL) ==
-            mock_client->ideal_vip_count * g_max_num_connections_per_vip);
+        ASSERT_TRUE(aws_s3_client_get_max_active_connections(mock_client, NULL) == mock_client->ideal_connection_count);
 
         for (size_t i = 0; i < AWS_S3_META_REQUEST_TYPE_MAX; ++i) {
-            ASSERT_TRUE(
-                mock_client->max_active_connections_override >
-                s_test_max_active_connections_host_count * g_num_conns_per_vip_meta_request_look_up[i]);
+            ASSERT_TRUE(mock_client->max_active_connections_override > mock_client->ideal_connection_count);
 
             ASSERT_TRUE(
                 aws_s3_client_get_max_active_connections(mock_client, mock_meta_requests[i]) ==
-                s_test_max_active_connections_host_count * g_num_conns_per_vip_meta_request_look_up[i]);
+                mock_client->ideal_connection_count);
         }
     }
 
@@ -822,12 +793,12 @@ static int s_test_s3_update_meta_requests_trigger_prepare(struct aws_allocator *
     struct aws_client_bootstrap mock_bootstrap;
     AWS_ZERO_STRUCT(mock_bootstrap);
 
-    const uint32_t ideal_vip_count = 10;
+    const uint32_t ideal_connection_count = 100;
 
     struct aws_s3_client *mock_client = aws_s3_tester_mock_client_new(&tester);
     mock_client->client_bootstrap = &mock_bootstrap;
     mock_client->vtable->get_host_address_count = s_test_s3_update_meta_request_trigger_prepare_get_host_address_count;
-    *((uint32_t *)&mock_client->ideal_vip_count) = ideal_vip_count;
+    *((uint32_t *)&mock_client->ideal_connection_count) = ideal_connection_count;
     aws_linked_list_init(&mock_client->threaded_data.request_queue);
     aws_linked_list_init(&mock_client->threaded_data.meta_requests);
 
@@ -872,27 +843,20 @@ static int s_test_s3_update_meta_requests_trigger_prepare(struct aws_allocator *
         &mock_meta_request_with_work->client_process_work_threaded_data.node);
     aws_s3_meta_request_acquire(mock_meta_request_with_work);
 
-    /* With no known addresses, the amount of requests that can be prepared should only be enough for one VIP. */
+    /* With no known addresses, the amount of requests that can be prepared should be lower. */
     {
         s_test_s3_update_meta_request_trigger_prepare_host_address_count = 0;
         aws_s3_client_update_meta_requests_threaded(mock_client);
 
         ASSERT_SUCCESS(s_validate_prepared_requests(
-            mock_client, g_max_num_connections_per_vip, mock_meta_request_with_work, mock_meta_request_without_work));
+            mock_client, g_min_num_connections, mock_meta_request_with_work, mock_meta_request_without_work));
     }
 
-    /* When the number of known addresses is greater than or equal to the ideal vip count, the max number of requests
-     * should be reached. */
+    /* When the number of known addresses is 1+, the max number of requests should be reached. */
     {
         const uint32_t max_requests_prepare = aws_s3_client_get_max_requests_prepare(mock_client);
 
-        s_test_s3_update_meta_request_trigger_prepare_host_address_count = (size_t)(ideal_vip_count);
-        aws_s3_client_update_meta_requests_threaded(mock_client);
-
-        ASSERT_SUCCESS(s_validate_prepared_requests(
-            mock_client, max_requests_prepare, mock_meta_request_with_work, mock_meta_request_without_work));
-
-        s_test_s3_update_meta_request_trigger_prepare_host_address_count = (size_t)(ideal_vip_count + 1);
+        s_test_s3_update_meta_request_trigger_prepare_host_address_count = 1;
         aws_s3_client_update_meta_requests_threaded(mock_client);
 
         ASSERT_SUCCESS(s_validate_prepared_requests(
@@ -980,7 +944,7 @@ static int s_test_s3_client_update_connections_finish_result(struct aws_allocato
     mock_client->vtable->create_connection_for_request =
         s_s3_test_meta_request_has_finish_result_client_create_connection_for_request;
 
-    *((uint32_t *)&mock_client->ideal_vip_count) = 1;
+    *((uint32_t *)&mock_client->ideal_connection_count) = 1;
 
     aws_linked_list_init(&mock_client->threaded_data.request_queue);