Skip to content

Memory leak in v9.17.2 with high concurrency - massive queuedNewConn and context object accumulation #3678

@jseparator

Description

@jseparator

Description

We discovered a severe memory leak in go-redis v9.17.2 when running in high-concurrency production environments with Redis Cluster. The issue manifests as massive accumulation of queuedNewConn and context-related objects, leading to significant memory growth over time.

Downgrading to v9.14.1 completely resolves the issue.

Environment

  • go-redis version: v9.17.2 (problem), v9.14.1 (works fine)
  • Go version: 1.25.5
  • Redis setup: Redis Cluster (~600 nodes)
  • Workload: 200 concurrent goroutines, each performing HMGet operations
  • Connection pool config:
    PoolSize: 100
    MinIdleConns: 10
    ConnMaxIdleTime: 3 * time.Minute
    ConnMaxLifetime: 1 * time.Hour
cli := redis.NewClusterClient(&redis.ClusterOptions{
	Addrs:           "xxx",
	PoolSize:        100,
	MinIdleConns:    10,
	ConnMaxIdleTime: time.Minute * 3,
	ConnMaxLifetime: time.Hour,
	DialTimeout:     time.Second * 5,
	WriteBufferSize: 1024,
	ReadBufferSize:  4096,
})

Symptoms

With v9.17.2 (Problem Version)

Using go tool pprof -inuse_objects over a time period:

Top memory-consuming objects:

  • queuedNewConn: 788,037 objects (44.35%)
  • context.WithDeadlineCause: 538,367 objects (25.82%)
  • context.(*cancelCtx).Done: 234,083 objects (7.82%)
    Total objects: 2,991,746

Network connections:
ESTABLISHED connections: 11,938
Per-node connections: 11-55 (average ~20)

Key observation: 788,037 connection creation requests vs only 11,938 actual connections = 66:1 ratio, indicating severe connection pool thrashing.

With v9.14.1 (Working Version)

After downgrading to v9.14.1:

Top memory-consuming objects:

  • queuedNewConn: 0 (not in top 30)
  • context objects: 0 (not in top 30)
    Total objects: 620,191 (79% reduction!)

Memory usage is stable and healthy.

Request

Could the maintainers:

  1. Review connection pool changes between v9.14.1 and v9.17.2
  2. Check context lifecycle management in high-concurrency scenarios
  3. Consider adding integration tests for cluster scenarios with 100+ concurrent operations

We're happy to provide more diagnostics or test specific patches if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions