Skip to content

mysql-connect_retries_on_failure not effective to failures like connection refused #4597

Open
@xykhappy

Description

@xykhappy

For the hostgroup with a single backend, if the backend server goes down, the ProxySQL will return the error "Max connect timeout reached while reaching hostgroup" after the "mysql-connect_timeout_server_max" duration elapses if the backend is marked as "SHUNNED". In contrast, if establishing a direct connection without ProxySQL, the client will get a "connection refused" error or error code 111 immediately, which makes more sense.

After debugging the latest binary, it appears that ProxySQL keeps attempting to get a good connection at intervals controlled by "mysql-connect_retries_delay" until "mysql-connect_timeout_server_max" is reached according to the method MySQL_Session::handler_again___status_CONNECTING_SERVER.

bool MySQL_Session::handler_again___status_CONNECTING_SERVER(int *_rc) {

And the retry is not even controlled by "mysql-connect_retries_on_failure" (as it should be) because the process always falls into this condition in the scenario I mentioned above.
if (mybe->server_myds->myconn==NULL) {

If debug further, we will find the method MyHGC::get_random_MySrvC returns NULL all the time during the retry for this single "SHUNNED" backend.
MySrvC *MyHGC::get_random_MySrvC(char * gtid_uuid, uint64_t gtid_trxid, int max_lag_ms, MySQL_Session *sess) {

According to this method, the ProxySQL should attempt to bring the "SHUNNED" backend online but it fails all the time because the "mysrvc->time_last_detected_error" is always in the future related to "mysql-monitor_ping_interval".

					// if Monitor is enabled and mysql-monitor_ping_interval is
					// set too high, ProxySQL will unshun hosts that are not
					// available. For this reason time_last_detected_error will
					// be tuned in the future
					if (mysql_thread___monitor_enabled) {
						int a = mysql_thread___shun_recovery_time_sec;
						int b = mysql_thread___monitor_ping_interval;
						b = b/1000;
						if (b > a) {
							t = t + (b - a);
						}
					}
					mysrvc->time_last_detected_error = t;

I'm not sure whether the logic is intentionally designed this way. However, the retry time should be controlled by 'mysql-connect_retries_on_failure' rather than its current implementation. Additionally, I believe it would be more effective to return the exact failure error to the client.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions