-
Notifications
You must be signed in to change notification settings - Fork 228
Description
Description
We observed a set of issues when using rueidis v1.0.69 with a Redis Cluster (v7.4).
During a failover in one shard of the cluster, the application experienced an increase in Redis request latency, followed by multiple client-side errors. No application code changes were deployed during this time.
The following issues were observed:
- Intermittent transaction errors:
EXEC was aborted by redis or connection closed
- Redis protocol parse errors:
rueidis: parse error: redis message type simple string is not a array
rueidis: parse error: redis message type array is not a string
- Connection-level errors:
EOF
dial tcp <ip>:<port>: operation was canceled
Once the parse errors started, the client did not recover and required restarting application pods.
Panic observed during cluster refresh
In the same time window, we observed a panic originating from the cluster topology refresh path.
panic: runtime error: index out of range [2] with length 0
goroutine 258842186 [running]:
github.com/redis/rueidis.parseSlots(...)
/go/pkg/mod/github.com/redis/[email protected]/cluster.go:378
github.com/redis/rueidis.clusterslots.parse(...)
/go/pkg/mod/github.com/redis/[email protected]/cluster.go:378
github.com/redis/rueidis.(*clusterClient)._refresh(...)
/go/pkg/mod/github.com/redis/[email protected]/cluster.go:167
github.com/redis/rueidis.(*call).do(...)
/go/pkg/mod/github.com/redis/[email protected]/cluster.go:217
github.com/redis/rueidis.(*call).LazyDo.func1(...)
/go/pkg/mod/github.com/redis/[email protected]/singleflight.go:58
github.com/redis/rueidis.(*call).LazyDo(...)
/go/pkg/mod/github.com/redis/[email protected]/singleflight.go:53
Client Code
- Connection dial timeout: default
- Connection write timeout: default
Client Initialization
client, err := rueidisotel.NewClient(
rueidis.ClientOption{
InitAddress: conf.NodeAddresses,
Username: conf.UserName,
Password: conf.Password,
DisableCache: conf.DisableCache,
CacheSizeEachConn: cacheSizeEachConnection,
},
)
if err != nil {
return err
}
if err := client.Do(
context.Background(),
client.B().Ping().Build(),
).Error(); err != nil {
return err
}
Client Usage
func (c *Cache) MGetWithClientSideCache(
ctx context.Context,
keys []string,
) (map[string]string, error) {
results, err := rueidis.MGetCache(
c.client,
ctx,
30*time.Second,
keys,
)
if err != nil {
return nil, err
}
finalResult := make(map[string]string, len(results))
for key, result := range results {
value, err := result.ToString()
if err != nil && !rueidis.IsRedisNil(err) {
return nil, err
}
finalResult[key] = value
}
return finalResult, nil
}
Environment
- Rueidis v1.0.69
- Go 1.23.0
- Redis Cluster v7.4
- Auto-Pipelining enabled
Questions
-
When does rueidis refresh the cluster topology?
If no refresh interval is configured, what situations cause a topology refresh to happen automatically? -
Is it normal to see EXEC abort errors during a shard failover?
Should applications expect these errors during cluster changes? -
What situations can lead to Redis protocol parse errors in rueidis?
For example, can they happen due to connection interruptions or cluster changes? -
After a protocol parse error, should the client recover on its own or be restarted?