Skip to content

High CPU usage when underlying GCS starts failing #64

@dbarashev

Description

@dbarashev

We had SSL issues with GCS and our requests failed like this:

11:03:18.955 [grpc-default-executor-627] [ERROR] [CosmasGoogleCloudService] Error while applying patches [fileId=0
d2974935531a8c1bc3965480efa5a46]
com.google.cloud.storage.StorageException: Remote host closed connection during handshake
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:291)
        at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:159)
        at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:156)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
        at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:156)
        at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:137)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.commitFromMemoryToGCS(CosmasGoogleClo
udService.kt:231)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.access$commitFromMemoryToGCS(CosmasGo
ogleCloudService.kt:65)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$commitVersion$1.invoke(CosmasGoogleCl
oudService.kt:180)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$commitVersion$1.invoke(CosmasGoogleCl
oudService.kt:65)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging(CosmasGoogleCloudService.kt
:51)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging$default(CosmasGoogleCloudSe
rvice.kt:41)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.commitVersion(CosmasGoogleCloudServic
e.kt:151)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGrpc$MethodHandlers.invoke(CosmasGrpc.java:921)

At the same time CPU and disk usage graphs looked like this

image

image

Is it possible that retry policy was configured wrong and we eventually collected a big queue of requests which were all retrying (=> failing, logging, etc) ?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions