Skip to content

Degraded performance of concurrent connection with TSI #511

@karuboniru

Description

@karuboniru

I noticed that when a server is running in krun container and expose service via TSI, a significant performance drop is present when concurrent short connection is present. So I created some small programs to benchmark what is the number of connection that TSI can handle:

https://github.com/karuboniru/socket_bench

This program just runs ping-pong between client and server, and client will disconnect after 10-100 iterations and restart another connection with 16 threads.

The results are

Normal crun container with pasta networking

=== BENCHMARK RESULTS ===
Threads      : 16
Time (Total) : 5.74 s
Connections  : 15988
Conn. Rate   : 2783 conn/s
--------------------------------
Sent Pkts    : 873111
Recv Pkts    : 873111
Loss Pkts    : 0
Loss Rate    : 0.000000 %
Matched      : 873111
Success Rate : 100 %
--------------------------------
Tx QPS       : 152028
Throughput   : 74.23 MB/s
================================

Fedora 43 QEMU guest with passt network (vhost-user)

=== BENCHMARK RESULTS ===
Threads      : 16
Time (Total) : 10.18 s
Connections  : 58533
Conn. Rate   : 5747 conn/s
--------------------------------
Sent Pkts    : 3211271
Recv Pkts    : 3211271
Loss Pkts    : 0
Loss Rate    : 0.000000 %
Matched      : 3211271
Success Rate : 100 %
--------------------------------
Tx QPS       : 315317
Throughput   : 153.96 MB/s
================================

Krun container with TSI network

=== BENCHMARK RESULTS ===
Threads      : 16
Time (Total) : 20.03 s
Connections  : 64
Conn. Rate   : 3 conn/s
--------------------------------
Sent Pkts    : 3276
Recv Pkts    : 321
Loss Pkts    : 2955
Loss Rate    : 90.201465 %
Matched      : 321
Success Rate : 100 %
--------------------------------
Tx QPS       : 163
Throughput   : 0.08 MB/s
================================

Note that Conn. Rate : 3 conn/s is very low for krun and TSI, and I don't see any high CPU usage during test with krun, which indicates that there might be a severe lock contention in the network stack of krun when creating/destroying connections.

And the number of connections is an integer multiple of number of threads, I suspect there is something that are forcing the creating/destroying process to be synchronous, causing once small window per 5s when new connection could be created?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions